New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework FW Plugins to prefetch only as many batches as needed #703
Conversation
Signed-off-by: Serge Panev <spanev@nvidia.com>
@@ -113,6 +113,8 @@ def __init__(self, | |||
|
|||
# We need data about the batches (like shape information), | |||
# so we need to run a single batch as part of setup to get that info | |||
for p in self._pipes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if moving this here gives any value as we are calling self.next() anyway few lanes bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next
is now doing:
ShareOutputs
- copy outputs to FW
ReleaseOutputs
Run
The very first next will be missing a Run
to get the outputs from. It has to be done once in the ctor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asking if moving prefetching from next to ctor improves anything. I was not asking about getting rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We certainly needed to remove call to _prefetch
as it was scheduling queue_size
runs of the pipeline for every next, and we consumed only one.
In next
we consume the output, so we need to schedule a new _run
after that. We also need to start full run for the first iteration (with prefetching that will fill the queues). As we need to call this first run
that causes the prefetching once, we do it in constructor.
I'd like to see new L0 python tests:
|
I totally agree here |
Build 687809 |
I'll submit the iterator tests in a separate PR |
Signed-off-by: Serge Panev spanev@nvidia.com
Co-author @klecki