colexec: wrap most unsupported processor cores into the vectorized flow #42873

yuzefovich · 2019-11-30T16:50:02Z

colexec: wrap all unsupported processor cores into the vectorized flow

This commit refactors the setup of the vectorized flow so that all
processor cores that do not have an equivalent columnar operator are
wrapped with a materializer and a columnarizer. It extracts all the
checks for whether the core is "supported" into a new function, and if
the core is not supported, it'll get wrapped. All such wrapped
processors are marked as "streaming" because they are not buffering
operators (although they could be buffering processors), and as
a result, such wrapped processors will be planned when vectorize is set
to 'auto'.

The only processor cores that are not wrapped are LocalPlanNode (because
wrapping it seems to break some assumptions, and a flow with such
processor probably will not benefit from vectorization anyway) and
MetadataTest{Sender,Receiver} (because the vectorized flow propagates
the metadata in an unexpected way to these processors).

Release note: None

rowexec, execinfra: move all processors into rowexec

Previously IndexJoiner and JoinReader lived in execinfra because the
vectorized flow (colflow package) needed to instantiate them. However,
now that we're wrapping almost all unsupported processors into the
vectorized flow, we are instantiating them using NewProcessor method,
and all processors can be unexported. This commit also moves the related
things from execinfra to rowexec and unexports a bunch of stuff as well
(previously, it was exported because processors in rowexec and execinfra
needed access, but now everything is in rowexec).

Release note: None

rowexec: implement execinfra.OpNode interface by some processors

Now that the processors can be wrapped into the vectorized flow, for
them to show up properly during EXPLAIN (VEC), they need to implement
execinfra.OpNode interface. This commit adds that.

Release note: None

sql: move dependency on rowexec from colexec to colflow

Previously, there was a dependency on sql/rowexec package from
sql/colexec package because we wrapping processors that do not have an
equivalent columnar operator into the vectorized flow. This commit moves
that dependency into sql/colflow package. It is done by abstracting away
a function-constructor that lives in sql/execinfra.

Release note: None

cockroach-teamcity · 2019-11-30T16:50:09Z

This change is

asubiotto

Reviewed 22 of 22 files at r1, 32 of 32 files at r2, 10 of 10 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)

pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):

	"github.com/cockroachdb/cockroach/pkg/sql/execinfra"
	"github.com/cockroachdb/cockroach/pkg/sql/execinfrapb"
	"github.com/cockroachdb/cockroach/pkg/sql/rowexec"

This dependency makes me slightly uncomfortable. Is there a way to avoid it?

pkg/sql/colexec/execplan.go, line 244 at r1 (raw file):

// processor described by spec. Note that it doesn't perform any other checks
// (like validity of the number of inputs).
func isSupported(spec *execinfrapb.ProcessorSpec) (bool, error) {

I like this, does this mean we can change our supported check (the one we have before sending out the flows for vectorization) to be more lightweight? Or would it not have much benefit given the interface you added to make constructions noops?

pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

		}

		c, err = wrapRowSources(

Should we have a heuristic that prohibits more than n wraps? I wonder what the threshold is where we'd just benefit from running the whole flow in the row execution engine.

pkg/sql/colexec/execplan.go, line 452 at r1 (raw file):

		// We say that the wrapped processor is "streaming" because it is not a
		// buffering operator (even if it is a buffering processor).

Could you just mention that memory used will still be accounted for?

pkg/sql/execinfra/stats.proto, line 19 at r2 (raw file):

// InputStats represents the stats collected from an input.
message InputStats {

Will the fact that these protos moved affect displaying these stats?

pkg/sql/logictest/testdata/logic_test/tpch_vec, line 781 at r2 (raw file):

│
└ Node 1
  └ *colexec.sortOp

Are we sure we want to implement the visibility into the non-vectorized tree by having processors implement the OpNode interface? It weird to have to implement a vectorized interface, and processors that haven't done so won't show up in explain. Is there a way to delegate explaining of a non-vectorized tree to another function? Or maybe have the node be an opaque "wrapped processor" (similar to LocalProcessor), but I'm not sure whether that's better. What you have might be the best solution right now.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)

pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

This dependency makes me slightly uncomfortable. Is there a way to avoid it?

I don't like it myself, but philosophically speaking, this dependency was already there the moment we decided to wrap any processor. Previously it was solved by having a few processor live in execinfra, but that's not right - the processors should live in rowexec, and this PR fixes that.

Now the question is how do we handle the mixed-flow? Ideally, the setup of the vectorized flow (which can include processors) would depend on rowexec, but colexec would only know about columnar execution.

While typing out this comment, I realized that it's a minor change, so I added another commit to do that.

pkg/sql/colexec/execplan.go, line 244 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I like this, does this mean we can change our supported check (the one we have before sending out the flows for vectorization) to be more lightweight? Or would it not have much benefit given the interface you added to make constructions noops?

I don't see much benefit in that direction - currently, we still want to fully instantiate the columnar operators because operators' constructors can return errors other than this isSupported check does.

pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Should we have a heuristic that prohibits more than n wraps? I wonder what the threshold is where we'd just benefit from running the whole flow in the row execution engine.

Good point, I was also thinking about this. I think that we mostly care about the pair of materializer - columnarizer that is added whenever the input to the wrapped processor is not a columnarizer already. We could in SupportsVectorized check simply traverse all the leaves (the execinfra.OpNodes) and see how many columnarizers we have in there relative to the total number of nodes. I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?

I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).

pkg/sql/colexec/execplan.go, line 452 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Could you just mention that memory used will still be accounted for?

Done.

pkg/sql/execinfra/stats.proto, line 19 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Will the fact that these protos moved affect displaying these stats?

I believe there will be no difference. Important point is that all protos are defined as if they are in package cockroach.sql.distsqlrun;.

pkg/sql/logictest/testdata/logic_test/tpch_vec, line 781 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Are we sure we want to implement the visibility into the non-vectorized tree by having processors implement the OpNode interface? It weird to have to implement a vectorized interface, and processors that haven't done so won't show up in explain. Is there a way to delegate explaining of a non-vectorized tree to another function? Or maybe have the node be an opaque "wrapped processor" (similar to LocalProcessor), but I'm not sure whether that's better. What you have might be the best solution right now.

Hm, I kinda like the way this output looks - it is clear that this vectorized flow is actually a "mixed" flow. I'd definitely want to see the names of the wrapped processors.

I agree that the fact that processors are implementing execinfra.OpNode interface is somewhat broken, but we already had that for indexJoiner and joinReader, and in this PR I've gone over all processors that implement execinfra.RowSource (necessary condition for being wrapped) and added the implementation. I think right now this is the least invasive change that does the job.

asubiotto

Reviewed 36 of 36 files at r4, 32 of 32 files at r5, 10 of 10 files at r6, 14 of 14 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)

pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):

Previously, yuzefovich wrote…

I don't like it myself, but philosophically speaking, this dependency was already there the moment we decided to wrap any processor. Previously it was solved by having a few processor live in execinfra, but that's not right - the processors should live in rowexec, and this PR fixes that.

Now the question is how do we handle the mixed-flow? Ideally, the setup of the vectorized flow (which can include processors) would depend on rowexec, but colexec would only know about columnar execution.

While typing out this comment, I realized that it's a minor change, so I added another commit to do that.

Awesome!

pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?
That SGTM. We might not want to get too clever with this, I think this decision should probably be pulled out into the optimizer at some point.

I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).
Not sure if we'd benefit much from that, what would that be for?

pkg/sql/colflow/colbatch_scan_test.go, line 82 at r7 (raw file):

				b.StopTimer()
				res, err := colexec.NewColOperator(
					ctx, &flowCtx, &spec, nil /* inputs */, testMemAcc,

With the addition of processorConstructor, I think we should change the arguments to be part of an arg struct to organize them better and make these calls less verbose in another PR

This commit refactors the setup of the vectorized flow so that all processor cores that do not have an equivalent columnar operator are wrapped with a materializer and a columnarizer. It extracts all the checks for whether the core is "supported" into a new function, and if the core is not supported, it'll get wrapped. All such wrapped processors are marked as "streaming" because they are not buffering operators (although they could be buffering processors), and as a result, such wrapped processors will be planned when vectorize is set to 'auto'. The only processor cores that are not wrapped are LocalPlanNode (because wrapping it seems to break some assumptions, and a flow with such processor probably will not benefit from vectorization anyway) and MetadataTest{Sender,Receiver} (because the vectorized flow propagates the metadata in an unexpected way to these processors). Release note: None

Previously IndexJoiner and JoinReader lived in execinfra because the vectorized flow (colflow package) needed to instantiate them. However, now that we're wrapping almost all unsupported processors into the vectorized flow, we are instantiating them using NewProcessor method, and all processors can be unexported. This commit also moves the related things from execinfra to rowexec and unexports a bunch of stuff as well (previously, it was exported because processors in rowexec and execinfra needed access, but now everything is in rowexec). Release note: None

Now that the processors can be wrapped into the vectorized flow, for them to show up properly during EXPLAIN (VEC), they need to implement execinfra.OpNode interface. This commit adds that. Release note: None

Previously, there was a dependency on sql/rowexec package from sql/colexec package because we wrapping processors that do not have an equivalent columnar operator into the vectorized flow. This commit moves that dependency into sql/colflow package. It is done by abstracting away a function-constructor that lives in sql/execinfra. Release note: None

yuzefovich

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)

pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?
That SGTM. We might not want to get too clever with this, I think this decision should probably be pulled out into the optimizer at some point.

I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).
Not sure if we'd benefit much from that, what would that be for?

I'm not sure, but it seems that the wrapped processors could be the bottlenecks in the mixed flows, disabling the wrapping would be somewhat similar to experimental_always. In theory, we would also need to tweak the optimizer cost model so that the plans with columnar operators are favored, but this idea is definitely not fully fleshed out.

pkg/sql/colflow/colbatch_scan_test.go, line 82 at r7 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

With the addition of processorConstructor, I think we should change the arguments to be part of an arg struct to organize them better and make these calls less verbose in another PR

Yeah, I agree.

42873: colexec: wrap most unsupported processor cores into the vectorized flow r=yuzefovich a=yuzefovich **colexec: wrap all unsupported processor cores into the vectorized flow** This commit refactors the setup of the vectorized flow so that all processor cores that do not have an equivalent columnar operator are wrapped with a materializer and a columnarizer. It extracts all the checks for whether the core is "supported" into a new function, and if the core is not supported, it'll get wrapped. All such wrapped processors are marked as "streaming" because they are not buffering operators (although they could be buffering processors), and as a result, such wrapped processors will be planned when vectorize is set to 'auto'. The only processor cores that are not wrapped are LocalPlanNode (because wrapping it seems to break some assumptions, and a flow with such processor probably will not benefit from vectorization anyway) and MetadataTest{Sender,Receiver} (because the vectorized flow propagates the metadata in an unexpected way to these processors). Release note: None **rowexec, execinfra: move all processors into rowexec** Previously IndexJoiner and JoinReader lived in execinfra because the vectorized flow (colflow package) needed to instantiate them. However, now that we're wrapping almost all unsupported processors into the vectorized flow, we are instantiating them using NewProcessor method, and all processors can be unexported. This commit also moves the related things from execinfra to rowexec and unexports a bunch of stuff as well (previously, it was exported because processors in rowexec and execinfra needed access, but now everything is in rowexec). Release note: None **rowexec: implement execinfra.OpNode interface by some processors** Now that the processors can be wrapped into the vectorized flow, for them to show up properly during EXPLAIN (VEC), they need to implement execinfra.OpNode interface. This commit adds that. Release note: None **sql: move dependency on rowexec from colexec to colflow** Previously, there was a dependency on sql/rowexec package from sql/colexec package because we wrapping processors that do not have an equivalent columnar operator into the vectorized flow. This commit moves that dependency into sql/colflow package. It is done by abstracting away a function-constructor that lives in sql/execinfra. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

craig · 2019-12-16T17:59:47Z

Build succeeded

GitHub CI (Cockroach)

Release justification: bug fixes and low-risk updates to new functionality. In cockroachdb#42873 we started wrapping most processors into the vectorized flow, and most of the processors implemented execinfra.OpNode interface at that time. However, we missed tableReader (I thought there would be no vectorized flows with them, only with colBatchScans, and as it turns out, it is now possible to have such a flow), and this commit fixes it. Release note: None

46062: colexec: add disk monitor to disk queue r=Azhng a=Azhng This commit plumbs down disk monitor from FlowCtx to disk queue. It will be monitoring disk usage of vectorize engine similar to row engine. Release justification: bug fixes and low-risk updates to new functionality since it is necessary for the new disk spilling infrastructure to have disk monitoring Release note: None Closes #45169 46200: rowexec: make tableReader implement execinfra.OpNode interface r=yuzefovich a=yuzefovich Release justification: bug fixes and low-risk updates to new functionality. In #42873 we started wrapping most processors into the vectorized flow, and most of the processors implemented execinfra.OpNode interface at that time. However, we missed tableReader (I thought there would be no vectorized flows with them, only with colBatchScans, and as it turns out, it is now possible to have such a flow), and this commit fixes it. Fixes: #46123. Release note: None Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

yuzefovich requested review from jordanlewis and asubiotto November 30, 2019 16:50

yuzefovich force-pushed the vec-wrap-all branch 2 times, most recently from ac7cc38 to f514968 Compare November 30, 2019 20:57

yuzefovich changed the title ~~colexec: wrap all unsupported processor cores into the vectorized flow~~ colexec: wrap most unsupported processor cores into the vectorized flow Nov 30, 2019

yuzefovich force-pushed the vec-wrap-all branch from f514968 to a661dd9 Compare December 2, 2019 18:29

yuzefovich requested a review from a team as a code owner December 2, 2019 18:29

yuzefovich force-pushed the vec-wrap-all branch 2 times, most recently from 5643e6f to 1af6ab3 Compare December 4, 2019 21:37

asubiotto suggested changes Dec 5, 2019

View reviewed changes

yuzefovich force-pushed the vec-wrap-all branch from 1af6ab3 to 0b5ef7a Compare December 5, 2019 18:35

yuzefovich commented Dec 5, 2019

View reviewed changes

asubiotto approved these changes Dec 16, 2019

View reviewed changes

yuzefovich added 4 commits December 16, 2019 07:56

rowexec: implement execinfra.OpNode interface by some processors

c6f8569

Now that the processors can be wrapped into the vectorized flow, for them to show up properly during EXPLAIN (VEC), they need to implement execinfra.OpNode interface. This commit adds that. Release note: None

yuzefovich force-pushed the vec-wrap-all branch from 0b5ef7a to b5a91ee Compare December 16, 2019 16:00

yuzefovich commented Dec 16, 2019

View reviewed changes

craig bot merged commit b5a91ee into cockroachdb:master Dec 16, 2019

yuzefovich deleted the vec-wrap-all branch December 16, 2019 18:02

yuzefovich mentioned this pull request Mar 17, 2020

rowexec: make tableReader implement execinfra.OpNode interface #46200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: wrap most unsupported processor cores into the vectorized flow #42873

colexec: wrap most unsupported processor cores into the vectorized flow #42873

yuzefovich commented Nov 30, 2019 •

edited

Loading

cockroach-teamcity commented Nov 30, 2019

asubiotto left a comment

yuzefovich left a comment

asubiotto left a comment

yuzefovich left a comment

craig bot commented Dec 16, 2019

colexec: wrap most unsupported processor cores into the vectorized flow #42873

colexec: wrap most unsupported processor cores into the vectorized flow #42873

Conversation

yuzefovich commented Nov 30, 2019 • edited Loading

cockroach-teamcity commented Nov 30, 2019

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Dec 16, 2019

Build succeeded

yuzefovich commented Nov 30, 2019 •

edited

Loading