Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add epoch number to SampleInfo and introduce BatchInfo #3420

Merged
merged 6 commits into from
Oct 14, 2021

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented Oct 12, 2021

Description

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (Redesign of existing code that doesn't affect functionality)
  • Other (e.g. Documentation, Tests, Configuration)

Add epoch number to SampleInfo class whose instances are passed to callbacks in external source in sample mode.
Add SampleInfo analog for batch mode.

Additional information

  • Affected modules and functionalities:
  • Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A
EXTSRC.02

JIRA TASK: N/A
DALI-2406

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Oct 12, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3173809]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3173809]: BUILD PASSED

@stiepan stiepan marked this pull request as ready for review October 13, 2021 08:22
@klecki klecki self-assigned this Oct 13, 2021
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@jantonguirao jantonguirao self-assigned this Oct 13, 2021


def test_epoch_idx():
workers_num = 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workers_num = 4
num_workers = 4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for epoch_size in (1, 3, 7):
for reader_queue_depth in (1, 5):
sample_cb = SampleCb(batch_size, epoch_size)
yield _test_epoch_idx, batch_size, epoch_size, sample_cb, workers_num, prefetch_queue_depth, reader_queue_depth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yield _test_epoch_idx, batch_size, epoch_size, sample_cb, workers_num, prefetch_queue_depth, reader_queue_depth
yield _test_epoch_idx, batch_size, epoch_size, sample_cb, num_workers, prefetch_queue_depth, reader_queue_depth

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -110,7 +110,7 @@ def test_pool_scheduled_tasks_cleanup(start_method):
with create_pool(callbacks, queue_depth=depth, num_workers=1, start_method=start_method) as pool:
pids = get_pids(pool)
pid = pids[0]
tasks_list = [(i, [(SampleInfo(i, 0, i),)]) for i in range(depth)]
tasks_list = [(i, [(SampleInfo(i, 0, i, 0),)]) for i in range(depth)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tasks_list = [(i, [(SampleInfo(i, 0, i, 0),)]) for i in range(depth)]
task_list = [(i, [(SampleInfo(i, 0, i, 0),)]) for i in range(depth)]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
for sample in batch:
epoch_data[np.array(sample)[0]] = True
assert sum(epoch_data) == epoch_size * batch_size, \
"Epoch number {} did not contian some samples from data set".format(epoch_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Epoch number {} did not contian some samples from data set".format(epoch_idx)
"Epoch number {} did not contain some samples from data set".format(epoch_idx)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating callbacks and a bit similar tests, don't know if it's worth to extract if it will be only small portion of code. Otherwise mostly ok.

self.current_iter + lead)
self.current_iter + lead,
epoch_idx)
elif self.batch_info:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either I missed it, or we never set self.batch_info to False, we keep the default as None, which is Falsy, but not really a False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added batch_info = self._batch_info or False in external_source.py:475

elif self.batch_info:
arg = nvidia.dali.types.BatchInfo(
self.current_iter + lead,
epoch_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the current_iter and current_sample are members of self and tracked by the ExternalSourceGroup, but the epoch_idx is not. I guess this is fine and better from the point of scheduling, as the Pipeline keeps the epoch number.
I think I answered my question before I even asked it.

Any thoughts on moving them out of here or moving epoch in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it crossed my mind as well - in fact I started with epoch_idx in external source but thought it's something common for all sources in the pipeline. Inspecting the code I believe that we could remove current_iter from external source and use pipeline._iter instead: prefetching doesn't increase the value, it seems that group.current_iter and pipline._iter remain (thankfully) in sync. I think I'll go with removing group._current_iter in favor of pipeline._iter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand current_sample doesn't seem to belong to the pipeline really, so we'd still have some discrepancies of what is managed where.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, most importantly passing iteration number from outside makes little sense in parallel mode due to prefetching. In that case the next prefetched batch is returned so argument wouldn't really control what's returned.

@@ -242,3 +242,110 @@ def test_discard():
sample_in_epoch = 0
iteration = 0
pipe.reset()


class SampleCb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing some repetition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same callback, but there are no common utils for those tests suits, so I didn't want to introduce another dependency just for this case. Abstracting test cases itself - even though there are some similarities - seemed counterproductive too as we would end up with util with many parameters and ifs, making it less clear if the test actually tests what it is supposed to. If you are fine with it, I would stay with this level of repetition.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Oct 13, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3179682]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3179682]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3179682]: BUILD PASSED

@stiepan stiepan merged commit 1477f6d into NVIDIA:main Oct 14, 2021
cyyever pushed a commit to cyyever/DALI that referenced this pull request Oct 17, 2021
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Feb 21, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
* Add epoch index to SampleInfo passed to sample mode callbacks in ES
* Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants