Per sample ExternalSource #2469

mzient · 2020-11-16T14:44:56Z

add batch argument that changes how source operates
update docs
add tests.

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Why we need this PR?

Pick one, remove the rest

It adds new feature needed as a prerequisite for parallel external source

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
- Add batch argument
- Add tests
Affected modules and functionalities:
- External source, pipeline
Key points relevant for the review:
- N/A
Validation and testing:
- Python tests
Documentation (including examples):
- Updated docs

JIRA TASK: DALI-1734

mzient · 2020-11-16T15:01:09Z

!build

stiepan · 2020-11-16T15:03:58Z

dali/python/nvidia/dali/pipeline.py

@@ -963,7 +963,7 @@ def _run_input_callbacks(self):
            return

        for group in self._input_callbacks:
-            group.call_and_feed(self, self._iter)
+            group.call_and_feed(self, self._batch_size, self._iter)


You mentioned in the morning the idea of inferring batch size from other externalsources if there are a few of them and some work in batch mode (falling back to pipeline batch size as a last resort when there are no external sources working in batch mode) - do you handle that/do we want such a feature?

No, it's not a part of this PR.
I think we still haven't fully worked out how the batch size is going to be handled in Python.

dali-automaton · 2020-11-16T15:05:34Z

CI MESSAGE: [1802766]: BUILD STARTED

dali/python/nvidia/dali/external_source.py

dali-automaton · 2020-11-16T16:33:15Z

CI MESSAGE: [1802766]: BUILD PASSED

stiepan

Looks good to me:)

szalpal

Please add also test cases with the generator

szalpal · 2020-11-16T17:21:04Z

dali/python/nvidia/dali/external_source.py

+
+`batch` : optional
+    If set to ``True`` or ``None``, the ``source`` is expected to produce an entire batch at once.
+    If set to ``False``, the ``source`` is called per-sample. Its first parameter is sample index.


How would you know the batch size in this case?

The whole point is to not have to know it - you just produce consecutive samples and that's it.

szalpal · 2020-11-16T17:22:06Z

dali/python/nvidia/dali/external_source.py

 """

    def __init__(self, source = None, num_outputs = None, *, cycle = None, layout = None, name = None, device = "cpu",
-                 cuda_stream = None, use_copy_kernel = None, **kwargs):
+                 cuda_stream = None, use_copy_kernel = None, batch = None, **kwargs):


How about calling it batch_mode? Seems like a better description of what it does

I think we already have a boolean argument called batch - which makes sense and prevents inconsistent naming (like batch_mode, batch_processing, batch_whatever)

It is for normalized, but I don't know if we can call it a corresponding example.

In any case, the documentation is rather clear. I'll change batch : optional to batch : bool, optional.

As Krzysztof pointed out, what should exactly be passed to the callback is tricky part. Maybe we should discuss it first.

mzient · 2020-11-20T15:28:45Z

!build

dali-automaton · 2020-11-20T15:31:01Z

CI MESSAGE: [1818768]: BUILD STARTED

mzient · 2020-11-20T15:57:49Z

!build

dali-automaton · 2020-11-21T11:20:16Z

CI MESSAGE: [1818899]: BUILD STARTED

dali-automaton · 2020-11-21T12:52:17Z

CI MESSAGE: [1818899]: BUILD PASSED

stiepan · 2020-11-23T12:15:19Z

dali/python/nvidia/dali/external_source.py

+            self.current_sample += batch_size
+            self.current_iter += 1
+        except StopIteration:
+            self.current_sample = 0


Could it be self.reset_indices() call?

It could. I guess I won't re-run CI just for that, so I'll wait for another review - if there are some more changes requested, I'll adjust this one.

* add `batch` argument that changes how `source` operates * update docs * add tests. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Change how samples and iterations are counted. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

JanuszL · 2020-11-26T16:43:47Z

dali/python/nvidia/dali/external_source.py

    iteration number and consecutive calls will be ``source(0)``, ``source(1)``, and so on.
+    A per-sample source may accept a :class:`nvidia.dali.types.SampleInfo` structure.


May or should?
I think it is either this or nothing.

May = if it takes a parameter, then it's that structure. But it may also take no parameters at all.

JanuszL · 2020-11-26T16:51:38Z

dali/python/nvidia/dali/pipeline.py

@@ -804,6 +804,9 @@ def reset(self):
            self._first_iter = True
            self._last_iter = False
            self._iter = 0
+            if self._input_callbacks:
+                for group in self._input_callbacks:
+                    group.reset_indices()


Isn't that a breaking change? So far we haven't reset the supplied indices.

Maybe it is, but I seriously doubt anyone has ever used the callback with argument - I don't think it's shown in any example.

JanuszL · 2020-11-26T16:52:48Z

dali/python/nvidia/dali/external_source.py

-    is expected to return one batch. If this value is specified, the data is expected to a be tuple,
-    or list, where each element corresponds to respective return value of the external_source.
-    If the source is a callable that accepts a positional argument, it is assumed to be the current
+    the source can supply one or more data items. The data item can be a whole batch (default) or


As we previously talked I would put the list of supported types here as well, I bet no one check fee_inputs looking for it.

- use `reset_indices()` on `StopIteration` - improve documentation. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2020-11-26T17:58:22Z

!build

mzient · 2020-11-26T17:59:49Z

dali/python/nvidia/dali/external_source.py

@@ -199,8 +242,7 @@ class ExternalSource():
    Used when feeding the data in ``iter_setup`` and can be omitted if
    the data is provided by ``source``.

-


Docs were broken with this extra new line...

dali-automaton · 2020-11-26T18:00:56Z

CI MESSAGE: [1837148]: BUILD STARTED

dali-automaton · 2020-11-26T19:45:28Z

CI MESSAGE: [1837148]: BUILD PASSED

mzient requested review from stiepan and a team November 16, 2020 14:44

stiepan reviewed Nov 16, 2020

View reviewed changes

dali/python/nvidia/dali/external_source.py Show resolved Hide resolved

stiepan previously approved these changes Nov 16, 2020

View reviewed changes

szalpal reviewed Nov 16, 2020

View reviewed changes

stiepan self-requested a review November 16, 2020 18:00

mzient force-pushed the PerSampleExt branch from 88dbdd5 to c09802f Compare November 20, 2020 15:21

mzient force-pushed the PerSampleExt branch from c09802f to 8d8b6b7 Compare November 20, 2020 15:30

stiepan reviewed Nov 23, 2020

View reviewed changes

stiepan approved these changes Nov 23, 2020

View reviewed changes

mzient and others added 4 commits November 26, 2020 17:08

Per sample external source:

6a21739

* add `batch` argument that changes how `source` operates * update docs * add tests. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix docs: sample index is optional.

ce9c4cd

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Pass a structure to per-sample ExternalSource callback.

647437c

Change how samples and iterations are counted. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix docs.

89edead

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

JanuszL reviewed Nov 26, 2020

View reviewed changes

Review issues:

388c051

- use `reset_indices()` on `StopIteration` - improve documentation. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient force-pushed the PerSampleExt branch from b74b8f5 to 388c051 Compare November 26, 2020 17:57

mzient commented Nov 26, 2020

View reviewed changes

JanuszL approved these changes Nov 26, 2020

View reviewed changes

mzient merged commit 8c77461 into NVIDIA:master Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per sample ExternalSource #2469

Per sample ExternalSource #2469

mzient commented Nov 16, 2020

mzient commented Nov 16, 2020

stiepan Nov 16, 2020 •

edited

mzient Nov 16, 2020 •

edited

stiepan Nov 16, 2020

dali-automaton commented Nov 16, 2020

dali-automaton commented Nov 16, 2020

stiepan left a comment

szalpal left a comment •

edited

szalpal Nov 16, 2020

mzient Nov 26, 2020

szalpal Nov 16, 2020

mzient Nov 17, 2020 •

edited

JanuszL Nov 26, 2020

mzient Nov 26, 2020

mzient commented Nov 20, 2020

dali-automaton commented Nov 20, 2020

mzient commented Nov 20, 2020

dali-automaton commented Nov 21, 2020

dali-automaton commented Nov 21, 2020

stiepan Nov 23, 2020

mzient Nov 23, 2020

JanuszL Nov 26, 2020 •

edited

mzient Nov 26, 2020

JanuszL Nov 26, 2020

mzient Nov 26, 2020 •

edited

JanuszL Nov 26, 2020

mzient Nov 26, 2020

mzient commented Nov 26, 2020

mzient Nov 26, 2020

dali-automaton commented Nov 26, 2020

dali-automaton commented Nov 26, 2020

		iteration number and consecutive calls will be ``source(0)``, ``source(1)``, and so on.
		A per-sample source may accept a :class:`nvidia.dali.types.SampleInfo` structure.

		@@ -199,8 +242,7 @@ class ExternalSource():
		Used when feeding the data in ``iter_setup`` and can be omitted if
		the data is provided by ``source``.

Per sample ExternalSource #2469

Per sample ExternalSource #2469

Conversation

mzient commented Nov 16, 2020

Why we need this PR?

What happened in this PR?

mzient commented Nov 16, 2020

stiepan Nov 16, 2020 • edited

Choose a reason for hiding this comment

mzient Nov 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Nov 16, 2020

dali-automaton commented Nov 16, 2020

stiepan left a comment

Choose a reason for hiding this comment

szalpal left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Nov 17, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient commented Nov 20, 2020

dali-automaton commented Nov 20, 2020

mzient commented Nov 20, 2020

dali-automaton commented Nov 21, 2020

dali-automaton commented Nov 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Nov 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Nov 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient commented Nov 26, 2020

Choose a reason for hiding this comment

dali-automaton commented Nov 26, 2020

dali-automaton commented Nov 26, 2020

stiepan Nov 16, 2020 •

edited

mzient Nov 16, 2020 •

edited

szalpal left a comment •

edited

mzient Nov 17, 2020 •

edited

JanuszL Nov 26, 2020 •

edited

mzient Nov 26, 2020 •

edited