-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: wrap most unsupported processor cores into the vectorized flow #42873
Conversation
ac7cc38
to
f514968
Compare
f514968
to
a661dd9
Compare
5643e6f
to
1af6ab3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 22 of 22 files at r1, 32 of 32 files at r2, 10 of 10 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):
"github.com/cockroachdb/cockroach/pkg/sql/execinfra" "github.com/cockroachdb/cockroach/pkg/sql/execinfrapb" "github.com/cockroachdb/cockroach/pkg/sql/rowexec"
This dependency makes me slightly uncomfortable. Is there a way to avoid it?
pkg/sql/colexec/execplan.go, line 244 at r1 (raw file):
// processor described by spec. Note that it doesn't perform any other checks // (like validity of the number of inputs). func isSupported(spec *execinfrapb.ProcessorSpec) (bool, error) {
I like this, does this mean we can change our supported check (the one we have before sending out the flows for vectorization) to be more lightweight? Or would it not have much benefit given the interface you added to make constructions noops?
pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):
} c, err = wrapRowSources(
Should we have a heuristic that prohibits more than n
wraps? I wonder what the threshold is where we'd just benefit from running the whole flow in the row execution engine.
pkg/sql/colexec/execplan.go, line 452 at r1 (raw file):
// We say that the wrapped processor is "streaming" because it is not a // buffering operator (even if it is a buffering processor).
Could you just mention that memory used will still be accounted for?
pkg/sql/execinfra/stats.proto, line 19 at r2 (raw file):
// InputStats represents the stats collected from an input. message InputStats {
Will the fact that these protos moved affect displaying these stats?
pkg/sql/logictest/testdata/logic_test/tpch_vec, line 781 at r2 (raw file):
│ └ Node 1 └ *colexec.sortOp
Are we sure we want to implement the visibility into the non-vectorized tree by having processors implement the OpNode
interface? It weird to have to implement a vectorized interface, and processors that haven't done so won't show up in explain. Is there a way to delegate explaining of a non-vectorized tree to another function? Or maybe have the node be an opaque "wrapped processor" (similar to LocalProcessor
), but I'm not sure whether that's better. What you have might be the best solution right now.
1af6ab3
to
0b5ef7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)
pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
This dependency makes me slightly uncomfortable. Is there a way to avoid it?
I don't like it myself, but philosophically speaking, this dependency was already there the moment we decided to wrap any processor. Previously it was solved by having a few processor live in execinfra
, but that's not right - the processors should live in rowexec
, and this PR fixes that.
Now the question is how do we handle the mixed-flow? Ideally, the setup of the vectorized flow (which can include processors) would depend on rowexec
, but colexec
would only know about columnar execution.
While typing out this comment, I realized that it's a minor change, so I added another commit to do that.
pkg/sql/colexec/execplan.go, line 244 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I like this, does this mean we can change our supported check (the one we have before sending out the flows for vectorization) to be more lightweight? Or would it not have much benefit given the interface you added to make constructions noops?
I don't see much benefit in that direction - currently, we still want to fully instantiate the columnar operators because operators' constructors can return errors other than this isSupported
check does.
pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Should we have a heuristic that prohibits more than
n
wraps? I wonder what the threshold is where we'd just benefit from running the whole flow in the row execution engine.
Good point, I was also thinking about this. I think that we mostly care about the pair of materializer - columnarizer
that is added whenever the input to the wrapped processor is not a columnarizer
already. We could in SupportsVectorized
check simply traverse all the leaves
(the execinfra.OpNode
s) and see how many columnarizer
s we have in there relative to the total number of nodes. I'd say that if > 20% of all nodes are columnarizer
s, then we will fallback to row-by-row engine with auto
setting. Thoughts?
I also was thinking that we might wanna introduce a vectorize
setting so that only columnar operators are planned with wrapping disallowed completely (even for index
and lookup
joins).
pkg/sql/colexec/execplan.go, line 452 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Could you just mention that memory used will still be accounted for?
Done.
pkg/sql/execinfra/stats.proto, line 19 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Will the fact that these protos moved affect displaying these stats?
I believe there will be no difference. Important point is that all protos are defined as if they are in package cockroach.sql.distsqlrun;
.
pkg/sql/logictest/testdata/logic_test/tpch_vec, line 781 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Are we sure we want to implement the visibility into the non-vectorized tree by having processors implement the
OpNode
interface? It weird to have to implement a vectorized interface, and processors that haven't done so won't show up in explain. Is there a way to delegate explaining of a non-vectorized tree to another function? Or maybe have the node be an opaque "wrapped processor" (similar toLocalProcessor
), but I'm not sure whether that's better. What you have might be the best solution right now.
Hm, I kinda like the way this output looks - it is clear that this vectorized flow is actually a "mixed" flow. I'd definitely want to see the names of the wrapped processors.
I agree that the fact that processors are implementing execinfra.OpNode
interface is somewhat broken, but we already had that for indexJoiner
and joinReader
, and in this PR I've gone over all processors that implement execinfra.RowSource
(necessary condition for being wrapped) and added the implementation. I think right now this is the least invasive change that does the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 36 of 36 files at r4, 32 of 32 files at r5, 10 of 10 files at r6, 14 of 14 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):
Previously, yuzefovich wrote…
I don't like it myself, but philosophically speaking, this dependency was already there the moment we decided to wrap any processor. Previously it was solved by having a few processor live in
execinfra
, but that's not right - the processors should live inrowexec
, and this PR fixes that.Now the question is how do we handle the mixed-flow? Ideally, the setup of the vectorized flow (which can include processors) would depend on
rowexec
, butcolexec
would only know about columnar execution.While typing out this comment, I realized that it's a minor change, so I added another commit to do that.
Awesome!
pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):
I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?
That SGTM. We might not want to get too clever with this, I think this decision should probably be pulled out into the optimizer at some point.
I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).
Not sure if we'd benefit much from that, what would that be for?
pkg/sql/colflow/colbatch_scan_test.go, line 82 at r7 (raw file):
b.StopTimer() res, err := colexec.NewColOperator( ctx, &flowCtx, &spec, nil /* inputs */, testMemAcc,
With the addition of processorConstructor
, I think we should change the arguments to be part of an arg struct to organize them better and make these calls less verbose in another PR
This commit refactors the setup of the vectorized flow so that all processor cores that do not have an equivalent columnar operator are wrapped with a materializer and a columnarizer. It extracts all the checks for whether the core is "supported" into a new function, and if the core is not supported, it'll get wrapped. All such wrapped processors are marked as "streaming" because they are not buffering operators (although they could be buffering processors), and as a result, such wrapped processors will be planned when vectorize is set to 'auto'. The only processor cores that are not wrapped are LocalPlanNode (because wrapping it seems to break some assumptions, and a flow with such processor probably will not benefit from vectorization anyway) and MetadataTest{Sender,Receiver} (because the vectorized flow propagates the metadata in an unexpected way to these processors). Release note: None
Previously IndexJoiner and JoinReader lived in execinfra because the vectorized flow (colflow package) needed to instantiate them. However, now that we're wrapping almost all unsupported processors into the vectorized flow, we are instantiating them using NewProcessor method, and all processors can be unexported. This commit also moves the related things from execinfra to rowexec and unexports a bunch of stuff as well (previously, it was exported because processors in rowexec and execinfra needed access, but now everything is in rowexec). Release note: None
Now that the processors can be wrapped into the vectorized flow, for them to show up properly during EXPLAIN (VEC), they need to implement execinfra.OpNode interface. This commit adds that. Release note: None
Previously, there was a dependency on sql/rowexec package from sql/colexec package because we wrapping processors that do not have an equivalent columnar operator into the vectorized flow. This commit moves that dependency into sql/colflow package. It is done by abstracting away a function-constructor that lives in sql/execinfra. Release note: None
0b5ef7a
to
b5a91ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)
pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?
That SGTM. We might not want to get too clever with this, I think this decision should probably be pulled out into the optimizer at some point.I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).
Not sure if we'd benefit much from that, what would that be for?
I'm not sure, but it seems that the wrapped processors could be the bottlenecks in the mixed flows, disabling the wrapping would be somewhat similar to experimental_always
. In theory, we would also need to tweak the optimizer cost model so that the plans with columnar operators are favored, but this idea is definitely not fully fleshed out.
pkg/sql/colflow/colbatch_scan_test.go, line 82 at r7 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
With the addition of
processorConstructor
, I think we should change the arguments to be part of an arg struct to organize them better and make these calls less verbose in another PR
Yeah, I agree.
42873: colexec: wrap most unsupported processor cores into the vectorized flow r=yuzefovich a=yuzefovich **colexec: wrap all unsupported processor cores into the vectorized flow** This commit refactors the setup of the vectorized flow so that all processor cores that do not have an equivalent columnar operator are wrapped with a materializer and a columnarizer. It extracts all the checks for whether the core is "supported" into a new function, and if the core is not supported, it'll get wrapped. All such wrapped processors are marked as "streaming" because they are not buffering operators (although they could be buffering processors), and as a result, such wrapped processors will be planned when vectorize is set to 'auto'. The only processor cores that are not wrapped are LocalPlanNode (because wrapping it seems to break some assumptions, and a flow with such processor probably will not benefit from vectorization anyway) and MetadataTest{Sender,Receiver} (because the vectorized flow propagates the metadata in an unexpected way to these processors). Release note: None **rowexec, execinfra: move all processors into rowexec** Previously IndexJoiner and JoinReader lived in execinfra because the vectorized flow (colflow package) needed to instantiate them. However, now that we're wrapping almost all unsupported processors into the vectorized flow, we are instantiating them using NewProcessor method, and all processors can be unexported. This commit also moves the related things from execinfra to rowexec and unexports a bunch of stuff as well (previously, it was exported because processors in rowexec and execinfra needed access, but now everything is in rowexec). Release note: None **rowexec: implement execinfra.OpNode interface by some processors** Now that the processors can be wrapped into the vectorized flow, for them to show up properly during EXPLAIN (VEC), they need to implement execinfra.OpNode interface. This commit adds that. Release note: None **sql: move dependency on rowexec from colexec to colflow** Previously, there was a dependency on sql/rowexec package from sql/colexec package because we wrapping processors that do not have an equivalent columnar operator into the vectorized flow. This commit moves that dependency into sql/colflow package. It is done by abstracting away a function-constructor that lives in sql/execinfra. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Build succeeded |
Release justification: bug fixes and low-risk updates to new functionality. In cockroachdb#42873 we started wrapping most processors into the vectorized flow, and most of the processors implemented execinfra.OpNode interface at that time. However, we missed tableReader (I thought there would be no vectorized flows with them, only with colBatchScans, and as it turns out, it is now possible to have such a flow), and this commit fixes it. Release note: None
Release justification: bug fixes and low-risk updates to new functionality. In cockroachdb#42873 we started wrapping most processors into the vectorized flow, and most of the processors implemented execinfra.OpNode interface at that time. However, we missed tableReader (I thought there would be no vectorized flows with them, only with colBatchScans, and as it turns out, it is now possible to have such a flow), and this commit fixes it. Release note: None
46062: colexec: add disk monitor to disk queue r=Azhng a=Azhng This commit plumbs down disk monitor from FlowCtx to disk queue. It will be monitoring disk usage of vectorize engine similar to row engine. Release justification: bug fixes and low-risk updates to new functionality since it is necessary for the new disk spilling infrastructure to have disk monitoring Release note: None Closes #45169 46200: rowexec: make tableReader implement execinfra.OpNode interface r=yuzefovich a=yuzefovich Release justification: bug fixes and low-risk updates to new functionality. In #42873 we started wrapping most processors into the vectorized flow, and most of the processors implemented execinfra.OpNode interface at that time. However, we missed tableReader (I thought there would be no vectorized flows with them, only with colBatchScans, and as it turns out, it is now possible to have such a flow), and this commit fixes it. Fixes: #46123. Release note: None Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
colexec: wrap all unsupported processor cores into the vectorized flow
This commit refactors the setup of the vectorized flow so that all
processor cores that do not have an equivalent columnar operator are
wrapped with a materializer and a columnarizer. It extracts all the
checks for whether the core is "supported" into a new function, and if
the core is not supported, it'll get wrapped. All such wrapped
processors are marked as "streaming" because they are not buffering
operators (although they could be buffering processors), and as
a result, such wrapped processors will be planned when vectorize is set
to 'auto'.
The only processor cores that are not wrapped are LocalPlanNode (because
wrapping it seems to break some assumptions, and a flow with such
processor probably will not benefit from vectorization anyway) and
MetadataTest{Sender,Receiver} (because the vectorized flow propagates
the metadata in an unexpected way to these processors).
Release note: None
rowexec, execinfra: move all processors into rowexec
Previously IndexJoiner and JoinReader lived in execinfra because the
vectorized flow (colflow package) needed to instantiate them. However,
now that we're wrapping almost all unsupported processors into the
vectorized flow, we are instantiating them using NewProcessor method,
and all processors can be unexported. This commit also moves the related
things from execinfra to rowexec and unexports a bunch of stuff as well
(previously, it was exported because processors in rowexec and execinfra
needed access, but now everything is in rowexec).
Release note: None
rowexec: implement execinfra.OpNode interface by some processors
Now that the processors can be wrapped into the vectorized flow, for
them to show up properly during EXPLAIN (VEC), they need to implement
execinfra.OpNode interface. This commit adds that.
Release note: None
sql: move dependency on rowexec from colexec to colflow
Previously, there was a dependency on sql/rowexec package from
sql/colexec package because we wrapping processors that do not have an
equivalent columnar operator into the vectorized flow. This commit moves
that dependency into sql/colflow package. It is done by abstracting away
a function-constructor that lives in sql/execinfra.
Release note: None