Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per sample ExternalSource #2469

Merged
merged 5 commits into from Nov 26, 2020
Merged

Per sample ExternalSource #2469

merged 5 commits into from Nov 26, 2020

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Nov 16, 2020

  • add batch argument that changes how source operates
  • update docs
  • add tests.

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Why we need this PR?

Pick one, remove the rest

  • It adds new feature needed as a prerequisite for parallel external source

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    • Add batch argument
    • Add tests
  • Affected modules and functionalities:
    • External source, pipeline
  • Key points relevant for the review:
    • N/A
  • Validation and testing:
    • Python tests
  • Documentation (including examples):
    • Updated docs

JIRA TASK: DALI-1734

@mzient mzient requested review from stiepan and a team November 16, 2020 14:44
@mzient
Copy link
Contributor Author

mzient commented Nov 16, 2020

!build

@@ -963,7 +963,7 @@ def _run_input_callbacks(self):
return

for group in self._input_callbacks:
group.call_and_feed(self, self._iter)
group.call_and_feed(self, self._batch_size, self._iter)
Copy link
Member

@stiepan stiepan Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned in the morning the idea of inferring batch size from other externalsources if there are a few of them and some work in batch mode (falling back to pipeline batch size as a last resort when there are no external sources working in batch mode) - do you handle that/do we want such a feature?

Copy link
Contributor Author

@mzient mzient Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not a part of this PR.
I think we still haven't fully worked out how the batch size is going to be handled in Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1802766]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1802766]: BUILD PASSED

stiepan
stiepan previously approved these changes Nov 16, 2020
Copy link
Member

@stiepan stiepan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me:)

Copy link
Member

@szalpal szalpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add also test cases with the generator


`batch` : optional
If set to ``True`` or ``None``, the ``source`` is expected to produce an entire batch at once.
If set to ``False``, the ``source`` is called per-sample. Its first parameter is sample index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you know the batch size in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point is to not have to know it - you just produce consecutive samples and that's it.

"""

def __init__(self, source = None, num_outputs = None, *, cycle = None, layout = None, name = None, device = "cpu",
cuda_stream = None, use_copy_kernel = None, **kwargs):
cuda_stream = None, use_copy_kernel = None, batch = None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling it batch_mode? Seems like a better description of what it does

Copy link
Contributor Author

@mzient mzient Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have a boolean argument called batch - which makes sense and prevents inconsistent naming (like batch_mode, batch_processing, batch_whatever)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for normalized, but I don't know if we can call it a corresponding example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, the documentation is rather clear. I'll change batch : optional to batch : bool, optional.

@stiepan stiepan self-requested a review November 16, 2020 18:00
@stiepan stiepan dismissed their stale review November 16, 2020 18:03

As Krzysztof pointed out, what should exactly be passed to the callback is tricky part. Maybe we should discuss it first.

@mzient
Copy link
Contributor Author

mzient commented Nov 20, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1818768]: BUILD STARTED

@mzient
Copy link
Contributor Author

mzient commented Nov 20, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1818899]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1818899]: BUILD PASSED

self.current_sample += batch_size
self.current_iter += 1
except StopIteration:
self.current_sample = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be self.reset_indices() call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could. I guess I won't re-run CI just for that, so I'll wait for another review - if there are some more changes requested, I'll adjust this one.

mzient and others added 4 commits November 26, 2020 17:08
* add `batch` argument that changes how `source` operates
* update docs
* add tests.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Change how samples and iterations are counted.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
iteration number and consecutive calls will be ``source(0)``, ``source(1)``, and so on.
A per-sample source may accept a :class:`nvidia.dali.types.SampleInfo` structure.
Copy link
Contributor

@JanuszL JanuszL Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May or should?
I think it is either this or nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May = if it takes a parameter, then it's that structure. But it may also take no parameters at all.

@@ -804,6 +804,9 @@ def reset(self):
self._first_iter = True
self._last_iter = False
self._iter = 0
if self._input_callbacks:
for group in self._input_callbacks:
group.reset_indices()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that a breaking change? So far we haven't reset the supplied indices.

Copy link
Contributor Author

@mzient mzient Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is, but I seriously doubt anyone has ever used the callback with argument - I don't think it's shown in any example.

is expected to return one batch. If this value is specified, the data is expected to a be tuple,
or list, where each element corresponds to respective return value of the external_source.
If the source is a callable that accepts a positional argument, it is assumed to be the current
the source can supply one or more data items. The data item can be a whole batch (default) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we previously talked I would put the list of supported types here as well, I bet no one check fee_inputs looking for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- use `reset_indices()` on `StopIteration`
- improve documentation.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Nov 26, 2020

!build

@@ -199,8 +242,7 @@ class ExternalSource():
Used when feeding the data in ``iter_setup`` and can be omitted if
the data is provided by ``source``.


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs were broken with this extra new line...

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1837148]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1837148]: BUILD PASSED

@mzient mzient merged commit 8c77461 into NVIDIA:master Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants