-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per sample ExternalSource #2469
Conversation
!build |
dali/python/nvidia/dali/pipeline.py
Outdated
@@ -963,7 +963,7 @@ def _run_input_callbacks(self): | |||
return | |||
|
|||
for group in self._input_callbacks: | |||
group.call_and_feed(self, self._iter) | |||
group.call_and_feed(self, self._batch_size, self._iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned in the morning the idea of inferring batch size from other externalsources if there are a few of them and some work in batch mode (falling back to pipeline batch size as a last resort when there are no external sources working in batch mode) - do you handle that/do we want such a feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not a part of this PR.
I think we still haven't fully worked out how the batch size is going to be handled in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
CI MESSAGE: [1802766]: BUILD STARTED |
CI MESSAGE: [1802766]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add also test cases with the generator
|
||
`batch` : optional | ||
If set to ``True`` or ``None``, the ``source`` is expected to produce an entire batch at once. | ||
If set to ``False``, the ``source`` is called per-sample. Its first parameter is sample index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you know the batch size in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point is to not have to know it - you just produce consecutive samples and that's it.
""" | ||
|
||
def __init__(self, source = None, num_outputs = None, *, cycle = None, layout = None, name = None, device = "cpu", | ||
cuda_stream = None, use_copy_kernel = None, **kwargs): | ||
cuda_stream = None, use_copy_kernel = None, batch = None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling it batch_mode
? Seems like a better description of what it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have a boolean argument called batch
- which makes sense and prevents inconsistent naming (like batch_mode, batch_processing, batch_whatever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for normalized, but I don't know if we can call it a corresponding example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, the documentation is rather clear. I'll change batch : optional
to batch : bool, optional
.
As Krzysztof pointed out, what should exactly be passed to the callback is tricky part. Maybe we should discuss it first.
!build |
CI MESSAGE: [1818768]: BUILD STARTED |
!build |
CI MESSAGE: [1818899]: BUILD STARTED |
CI MESSAGE: [1818899]: BUILD PASSED |
self.current_sample += batch_size | ||
self.current_iter += 1 | ||
except StopIteration: | ||
self.current_sample = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be self.reset_indices() call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could. I guess I won't re-run CI just for that, so I'll wait for another review - if there are some more changes requested, I'll adjust this one.
* add `batch` argument that changes how `source` operates * update docs * add tests. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Change how samples and iterations are counted. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
iteration number and consecutive calls will be ``source(0)``, ``source(1)``, and so on. | ||
A per-sample source may accept a :class:`nvidia.dali.types.SampleInfo` structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May or should?
I think it is either this or nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May = if it takes a parameter, then it's that structure. But it may also take no parameters at all.
@@ -804,6 +804,9 @@ def reset(self): | |||
self._first_iter = True | |||
self._last_iter = False | |||
self._iter = 0 | |||
if self._input_callbacks: | |||
for group in self._input_callbacks: | |||
group.reset_indices() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that a breaking change? So far we haven't reset the supplied indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is, but I seriously doubt anyone has ever used the callback with argument - I don't think it's shown in any example.
is expected to return one batch. If this value is specified, the data is expected to a be tuple, | ||
or list, where each element corresponds to respective return value of the external_source. | ||
If the source is a callable that accepts a positional argument, it is assumed to be the current | ||
the source can supply one or more data items. The data item can be a whole batch (default) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we previously talked I would put the list of supported types here as well, I bet no one check fee_inputs looking for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- use `reset_indices()` on `StopIteration` - improve documentation. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
!build |
@@ -199,8 +242,7 @@ class ExternalSource(): | |||
Used when feeding the data in ``iter_setup`` and can be omitted if | |||
the data is provided by ``source``. | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs were broken with this extra new line...
CI MESSAGE: [1837148]: BUILD STARTED |
CI MESSAGE: [1837148]: BUILD PASSED |
batch
argument that changes howsource
operatesSigned-off-by: Michał Zientkiewicz mzient@gmail.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
batch
argumentJIRA TASK: DALI-1734