Rework FW Plugins to prefetch only as many batches as needed #703

Kh4L · 2019-03-27T15:55:24Z

Signed-off-by: Serge Panev spanev@nvidia.com
Co-author @klecki

Signed-off-by: Serge Panev <spanev@nvidia.com>

JanuszL · 2019-03-27T16:09:41Z

dali/python/nvidia/dali/plugin/mxnet.py

@@ -113,6 +113,8 @@ def __init__(self,

        # We need data about the batches (like shape information),
        # so we need to run a single batch as part of setup to get that info
+        for p in self._pipes:


I don't know if moving this here gives any value as we are calling self.next() anyway few lanes bellow.

next is now doing:

ShareOutputs

copy outputs to FW

ReleaseOutputs

Run

The very first next will be missing a Run to get the outputs from. It has to be done once in the ctor.

I was asking if moving prefetching from next to ctor improves anything. I was not asking about getting rid of it.

We certainly needed to remove call to _prefetch as it was scheduling queue_size runs of the pipeline for every next, and we consumed only one.

In next we consume the output, so we need to schedule a new _run after that. We also need to start full run for the first iteration (with prefetching that will fill the queues). As we need to call this first run that causes the prefetching once, we do it in constructor.

jantonguirao · 2019-03-28T11:51:47Z

I'd like to see new L0 python tests:

covering all the iterators going around some decent amount of data
training one epoch on a small dataset for every framework
At least (1) to make sure that this fix works

Kh4L · 2019-03-28T13:07:42Z

I'd like to see new L0 python tests:

1. covering all the iterators going around some decent amount of data

2. training one epoch on a small dataset for every framework
   At least (1) to make sure that this fix works

I totally agree here
I am planning to add them in a next PR, but running and training on 3 epochs on a small dataset, to see how the iterator handles epoch incrementing

Kh4L · 2019-03-28T15:26:01Z

Build 687809

jantonguirao · 2019-03-28T16:35:38Z

I'll submit the iterator tests in a separate PR

) PyTorch and MXNet iterators were calling `prefetch` on every iteration, that would cause to have `queue_depth` runs called on each iteration. This rework the iterators to call the necessary number of runs.

Rework FW Plugins to prefetch only as many batches as needed

9858c46

Signed-off-by: Serge Panev <spanev@nvidia.com>

JanuszL reviewed Mar 27, 2019

View reviewed changes

klecki approved these changes Mar 28, 2019

View reviewed changes

Kh4L requested review from JanuszL, klecki, jantonguirao, awolant and szalpal March 28, 2019 11:43

Kh4L changed the title ~~[WIP] Rework FW Plugins to prefetch only as many batches as needed~~ Rework FW Plugins to prefetch only as many batches as needed Mar 28, 2019

klecki approved these changes Mar 28, 2019

View reviewed changes

jantonguirao approved these changes Mar 28, 2019

View reviewed changes

Kh4L merged commit be8e676 into NVIDIA:master Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework FW Plugins to prefetch only as many batches as needed #703

Rework FW Plugins to prefetch only as many batches as needed #703

Kh4L commented Mar 27, 2019

JanuszL Mar 27, 2019

Kh4L Mar 27, 2019

JanuszL Mar 27, 2019

klecki Mar 28, 2019

jantonguirao commented Mar 28, 2019

Kh4L commented Mar 28, 2019

Kh4L commented Mar 28, 2019

jantonguirao commented Mar 28, 2019

Rework FW Plugins to prefetch only as many batches as needed #703

Rework FW Plugins to prefetch only as many batches as needed #703

Conversation

Kh4L commented Mar 27, 2019

JanuszL Mar 27, 2019

Choose a reason for hiding this comment

Kh4L Mar 27, 2019

Choose a reason for hiding this comment

JanuszL Mar 27, 2019

Choose a reason for hiding this comment

klecki Mar 28, 2019

Choose a reason for hiding this comment

jantonguirao commented Mar 28, 2019

Kh4L commented Mar 28, 2019

Kh4L commented Mar 28, 2019

jantonguirao commented Mar 28, 2019