-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add epoch number to SampleInfo and introduce BatchInfo #3420
Conversation
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
!build |
CI MESSAGE: [3173809]: BUILD STARTED |
CI MESSAGE: [3173809]: BUILD PASSED |
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
|
||
|
||
def test_epoch_idx(): | ||
workers_num = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workers_num = 4 | |
num_workers = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for epoch_size in (1, 3, 7): | ||
for reader_queue_depth in (1, 5): | ||
sample_cb = SampleCb(batch_size, epoch_size) | ||
yield _test_epoch_idx, batch_size, epoch_size, sample_cb, workers_num, prefetch_queue_depth, reader_queue_depth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield _test_epoch_idx, batch_size, epoch_size, sample_cb, workers_num, prefetch_queue_depth, reader_queue_depth | |
yield _test_epoch_idx, batch_size, epoch_size, sample_cb, num_workers, prefetch_queue_depth, reader_queue_depth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/test/python/test_pool.py
Outdated
@@ -110,7 +110,7 @@ def test_pool_scheduled_tasks_cleanup(start_method): | |||
with create_pool(callbacks, queue_depth=depth, num_workers=1, start_method=start_method) as pool: | |||
pids = get_pids(pool) | |||
pid = pids[0] | |||
tasks_list = [(i, [(SampleInfo(i, 0, i),)]) for i in range(depth)] | |||
tasks_list = [(i, [(SampleInfo(i, 0, i, 0),)]) for i in range(depth)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tasks_list = [(i, [(SampleInfo(i, 0, i, 0),)]) for i in range(depth)] | |
task_list = [(i, [(SampleInfo(i, 0, i, 0),)]) for i in range(depth)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
for sample in batch: | ||
epoch_data[np.array(sample)[0]] = True | ||
assert sum(epoch_data) == epoch_size * batch_size, \ | ||
"Epoch number {} did not contian some samples from data set".format(epoch_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Epoch number {} did not contian some samples from data set".format(epoch_idx) | |
"Epoch number {} did not contain some samples from data set".format(epoch_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating callbacks and a bit similar tests, don't know if it's worth to extract if it will be only small portion of code. Otherwise mostly ok.
self.current_iter + lead) | ||
self.current_iter + lead, | ||
epoch_idx) | ||
elif self.batch_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either I missed it, or we never set self.batch_info to False, we keep the default as None, which is Falsy, but not really a False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added batch_info = self._batch_info or False
in external_source.py:475
elif self.batch_info: | ||
arg = nvidia.dali.types.BatchInfo( | ||
self.current_iter + lead, | ||
epoch_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the current_iter and current_sample are members of self
and tracked by the ExternalSourceGroup, but the epoch_idx is not. I guess this is fine and better from the point of scheduling, as the Pipeline keeps the epoch number.
I think I answered my question before I even asked it.
Any thoughts on moving them out of here or moving epoch in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it crossed my mind as well - in fact I started with epoch_idx in external source but thought it's something common for all sources in the pipeline. Inspecting the code I believe that we could remove current_iter from external source and use pipeline._iter instead: prefetching doesn't increase the value, it seems that group.current_iter and pipline._iter remain (thankfully) in sync. I think I'll go with removing group._current_iter in favor of pipeline._iter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand current_sample
doesn't seem to belong to the pipeline really, so we'd still have some discrepancies of what is managed where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, most importantly passing iteration number from outside makes little sense in parallel mode due to prefetching. In that case the next prefetched batch is returned so argument wouldn't really control what's returned.
@@ -242,3 +242,110 @@ def test_discard(): | |||
sample_in_epoch = 0 | |||
iteration = 0 | |||
pipe.reset() | |||
|
|||
|
|||
class SampleCb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing some repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same callback, but there are no common utils for those tests suits, so I didn't want to introduce another dependency just for this case. Abstracting test cases itself - even though there are some similarities - seemed counterproductive too as we would end up with util with many parameters and ifs, making it less clear if the test actually tests what it is supposed to. If you are fine with it, I would stay with this level of repetition.
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
!build |
CI MESSAGE: [3179682]: BUILD STARTED |
CI MESSAGE: [3179682]: BUILD FAILED |
CI MESSAGE: [3179682]: BUILD PASSED |
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add epoch index to SampleInfo passed to sample mode callbacks in ES * Introduce BatchInfo (a batch mode SampleInfo's counterpart) containing iteration number and epoch index. Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Description
Add epoch number to SampleInfo class whose instances are passed to callbacks in external source in sample mode.
Add SampleInfo analog for batch mode.
Additional information
Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: N/A
EXTSRC.02
JIRA TASK: N/A
DALI-2406