-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colflow: introduce flow coordinator and simplify materializer #64697
Conversation
afa2482
to
d9fd137
Compare
I still need to run the benchmarks, but I'm hopeful that this will be a slight improvement for write-heavy workloads (like KV0) given that we remove some redundant stuff. |
bfce61e
to
436de34
Compare
Alright, I think I'm done force-pushing. The benchmarks are still missing, but otherwise it is RFAL. |
41d6a7b
to
0d41a6a
Compare
kv0 benchmarks don't show much difference (maybe I'm not pushing the benchmark hard enough):
In any case, I believe this to be a beneficial change overall (from the tech debt perspective). |
Assigning @michae2 as the main reviewer, but it'd be great to get input from @jordanlewis too. |
Previously, the output of a processor was embedded in `ProcOutputHelper`. This commit moves it out into the `ProcessorBase` because the follow-up commit will take advantage of such placement. Release note: None
This commit renames `ProcessorBase.Out` to `ProcessorBase.OutputHelper`. It also removes the large part of the comment on `ProcessorBase` since it has become quite stale, and it is better to take a look at the existing users of the struct as a guide (the actual code should be up-to-date!). Release note: None
In some cases, `ProcOutputHelper` that lives in the `ProcessorBase` is not used by the caller, yet it is always allocated. This commit extracts `ProcessorBaseNoHelper` that doesn't contain the helper (as well as some other fields) and uses that in the materializers and columnarizers. This should reduce the size of the materializers slightly. This commit was prompted by the fact that the follow-up commit will introduce another processor (the vectorized flow coordinator) that also doesn't utilize the `ProcOutputHelper`. Release note: None
0d41a6a
to
0f66f5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1, 19 of 19 files at r2, 16 of 16 files at r3, 15 of 18 files at r4, 41 of 41 files at r5, 19 of 19 files at r6, 17 of 17 files at r7, 18 of 18 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colflow/flow_coordinator.go, line 62 at r8 (raw file):
cancelFlow func() context.CancelFunc, ) *FlowCoordinator { f := flowCoordinatorPool.Get().(*FlowCoordinator)
Where do we call flowCoordinatorPool.Put
?
pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):
Quoted 5 lines of code…
if err := colexecerror.CatchVectorizedRuntimeError(func() { f.input.Start(ctx) }); err != nil { f.MoveToDraining(err) }
Is this necessary? Won't there always be a Materializer somewhere beneath this in the stack, already catching these panics?
pkg/sql/colflow/flow_coordinator.go, line 152 at r8 (raw file):
Quoted 4 lines of code…
if err := colexecerror.CatchVectorizedRuntimeError(f.nextAdapter); err != nil { f.MoveToDraining(err) return nil, f.DrainHelper() }
Same question: Won't there always be a Materializer somewhere beneath this catching these panics?
This commit extracts the logic of shutting down the vectorized flow out of the materializer which simplifies the latter. This allows us to optimize the case when the root of the whole plan is a wrapped row-execution processor. Previously, in such a scenario we would plan a columnarizer followed by a materializer because the latter was needed in order to shut the flow down. This commit removes this redundant pair of operators. Release note: None
0f66f5c
to
a4bf824
Compare
Found a bug with Thanks for the review! Just noticed that you also spotted the bug with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
pkg/sql/colflow/flow_coordinator.go, line 62 at r8 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Where do we call
flowCoordinatorPool.Put
?
Nice catch! Fixed. (I also found it while working on #50857).
pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):
Previously, michae2 (Michael Erickson) wrote…
if err := colexecerror.CatchVectorizedRuntimeError(func() { f.input.Start(ctx) }); err != nil { f.MoveToDraining(err) }
Is this necessary? Won't there always be a Materializer somewhere beneath this in the stack, already catching these panics?
No, there might not be - if we don't have any wrapped processors (i.e. the whole flow consists only of colexecop.Operator
s), then there won't be any materializers. We always have to put the catcher in the root components (the flow coordinator, the outbox, the hash router), but we also have the catcher in the parallel unordered sync because it spins up separate goroutines.
pkg/sql/colflow/flow_coordinator.go, line 152 at r8 (raw file):
Previously, michae2 (Michael Erickson) wrote…
if err := colexecerror.CatchVectorizedRuntimeError(f.nextAdapter); err != nil { f.MoveToDraining(err) return nil, f.DrainHelper() }
Same question: Won't there always be a Materializer somewhere beneath this catching these panics?
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colflow/flow_coordinator.go, line 62 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Nice catch! Fixed. (I also found it while working on #50857).
OK. Great minds think alike!
pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
No, there might not be - if we don't have any wrapped processors (i.e. the whole flow consists only of
colexecop.Operator
s), then there won't be any materializers. We always have to put the catcher in the root components (the flow coordinator, the outbox, the hash router), but we also have the catcher in the parallel unordered sync because it spins up separate goroutines.
Ah, I was only thinking of the flow on the gateway node (it always has a Materializer, right?). Not other nodes. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Ah, I was only thinking of the flow on the gateway node (it always has a Materializer, right?). Not other nodes. Thank you!
Oh, sorry, what I said was partially wrong. (I'm currently working on a batch flow coordinator where we will not have a materializer in the scenario I described above.)
With this PR it is possible to have a wrapped row-execution processor as the root, and we add a flow coordinator on top of it. Consider a chain something like flow coordinator -> zigzag Joiner
(zigzag joiner is a random processor that can read from two tables at once). In this case we will not have a materializer, so we have to have a panic catcher. However, in the case when we only have colexecop.Operator
s, then we will have a root materializer, so the panic catcher in the coordinator would be redundant in that case, but the cost of that extra panic catcher is negligent.
I'll go ahead and merge this, but I still want to run the benchmarks on this change. TFTR! bors r+ |
Build succeeded: |
The new benchmarks also didn't show noticeable difference on kv0. GCE:
AWS:
|
execinfra: refactor ProcessorBase slightly
Previously, the output of a processor was embedded in
ProcOutputHelper
. This commit moves it out into theProcessorBase
because the follow-up commit will take advantage of such placement.
Release note: None
execinfra: rename a field in ProcOutputHelper
This commit renames
ProcessorBase.Out
toProcessorBase.OutputHelper
.It also removes the large part of the comment on
ProcessorBase
sinceit has become quite stale, and it is better to take a look at the
existing users of the struct as a guide (the actual code should be
up-to-date!).
Release note: None
execinfra: separate out ProcOutputHelper from ProcessorBase
In some cases,
ProcOutputHelper
that lives in theProcessorBase
isnot used by the caller, yet it is always allocated. This commit extracts
ProcessorBaseNoHelper
that doesn't contain the helper (as well as someother fields) and uses that in the materializer. This should reduce the
size of the materializers slightly.
This commit was prompted by the fact that the follow-up commit will
introduce another processor (the vectorized flow coordinator) that also
doesn't utilize the
ProcOutputHelper
.Release note: None
colflow: introduce flow coordinator and simplify materializer
This commit extracts the logic of shutting down the vectorized flow out
of the materializer which simplifies the latter. This allows us to
optimize the case when the root of the whole plan is a wrapped
row-execution processor. Previously, in such a scenario we would plan
a columnarizer followed by a materializer because the latter was needed
in order to shut the flow down. This commit removes this redundant pair
of operators.
Informs: #50857.
Informs: #55758.
Release note: None