Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework FW Plugins to prefetch only as many batches as needed #703

Merged
merged 1 commit into from Mar 28, 2019

Conversation

Kh4L
Copy link
Contributor

@Kh4L Kh4L commented Mar 27, 2019

Signed-off-by: Serge Panev spanev@nvidia.com
Co-author @klecki

Signed-off-by: Serge Panev <spanev@nvidia.com>
@@ -113,6 +113,8 @@ def __init__(self,

# We need data about the batches (like shape information),
# so we need to run a single batch as part of setup to get that info
for p in self._pipes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if moving this here gives any value as we are calling self.next() anyway few lanes bellow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next is now doing:

  • ShareOutputs
  • copy outputs to FW
  • ReleaseOutputs
  • Run

The very first next will be missing a Run to get the outputs from. It has to be done once in the ctor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking if moving prefetching from next to ctor improves anything. I was not asking about getting rid of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We certainly needed to remove call to _prefetch as it was scheduling queue_size runs of the pipeline for every next, and we consumed only one.

In next we consume the output, so we need to schedule a new _run after that. We also need to start full run for the first iteration (with prefetching that will fill the queues). As we need to call this first run that causes the prefetching once, we do it in constructor.

@jantonguirao
Copy link
Contributor

I'd like to see new L0 python tests:

  1. covering all the iterators going around some decent amount of data
  2. training one epoch on a small dataset for every framework
    At least (1) to make sure that this fix works

@Kh4L
Copy link
Contributor Author

Kh4L commented Mar 28, 2019

I'd like to see new L0 python tests:

1. covering all the iterators going around some decent amount of data

2. training one epoch on a small dataset for every framework
   At least (1) to make sure that this fix works

I totally agree here
I am planning to add them in a next PR, but running and training on 3 epochs on a small dataset, to see how the iterator handles epoch incrementing

@Kh4L Kh4L changed the title [WIP] Rework FW Plugins to prefetch only as many batches as needed Rework FW Plugins to prefetch only as many batches as needed Mar 28, 2019
@Kh4L
Copy link
Contributor Author

Kh4L commented Mar 28, 2019

Build 687809

@jantonguirao
Copy link
Contributor

I'll submit the iterator tests in a separate PR

@Kh4L Kh4L merged commit be8e676 into NVIDIA:master Mar 28, 2019
haoxintong pushed a commit to haoxintong/DALI that referenced this pull request Jul 16, 2019
)

PyTorch and MXNet iterators were calling `prefetch` on every iteration, that would cause to have `queue_depth` runs called on each iteration. This rework the iterators to call the necessary number of runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants