Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: wrap most unsupported processor cores into the vectorized flow #42873

Merged
merged 4 commits into from
Dec 16, 2019

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Nov 30, 2019

colexec: wrap all unsupported processor cores into the vectorized flow

This commit refactors the setup of the vectorized flow so that all
processor cores that do not have an equivalent columnar operator are
wrapped with a materializer and a columnarizer. It extracts all the
checks for whether the core is "supported" into a new function, and if
the core is not supported, it'll get wrapped. All such wrapped
processors are marked as "streaming" because they are not buffering
operators (although they could be buffering processors), and as
a result, such wrapped processors will be planned when vectorize is set
to 'auto'.

The only processor cores that are not wrapped are LocalPlanNode (because
wrapping it seems to break some assumptions, and a flow with such
processor probably will not benefit from vectorization anyway) and
MetadataTest{Sender,Receiver} (because the vectorized flow propagates
the metadata in an unexpected way to these processors).

Release note: None

rowexec, execinfra: move all processors into rowexec

Previously IndexJoiner and JoinReader lived in execinfra because the
vectorized flow (colflow package) needed to instantiate them. However,
now that we're wrapping almost all unsupported processors into the
vectorized flow, we are instantiating them using NewProcessor method,
and all processors can be unexported. This commit also moves the related
things from execinfra to rowexec and unexports a bunch of stuff as well
(previously, it was exported because processors in rowexec and execinfra
needed access, but now everything is in rowexec).

Release note: None

rowexec: implement execinfra.OpNode interface by some processors

Now that the processors can be wrapped into the vectorized flow, for
them to show up properly during EXPLAIN (VEC), they need to implement
execinfra.OpNode interface. This commit adds that.

Release note: None

sql: move dependency on rowexec from colexec to colflow

Previously, there was a dependency on sql/rowexec package from
sql/colexec package because we wrapping processors that do not have an
equivalent columnar operator into the vectorized flow. This commit moves
that dependency into sql/colflow package. It is done by abstracting away
a function-constructor that lives in sql/execinfra.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the vec-wrap-all branch 2 times, most recently from ac7cc38 to f514968 Compare November 30, 2019 20:57
@yuzefovich yuzefovich changed the title colexec: wrap all unsupported processor cores into the vectorized flow colexec: wrap most unsupported processor cores into the vectorized flow Nov 30, 2019
@yuzefovich yuzefovich requested a review from a team as a code owner December 2, 2019 18:29
@yuzefovich yuzefovich force-pushed the vec-wrap-all branch 2 times, most recently from 5643e6f to 1af6ab3 Compare December 4, 2019 21:37
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 22 of 22 files at r1, 32 of 32 files at r2, 10 of 10 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):

	"github.com/cockroachdb/cockroach/pkg/sql/execinfra"
	"github.com/cockroachdb/cockroach/pkg/sql/execinfrapb"
	"github.com/cockroachdb/cockroach/pkg/sql/rowexec"

This dependency makes me slightly uncomfortable. Is there a way to avoid it?


pkg/sql/colexec/execplan.go, line 244 at r1 (raw file):

// processor described by spec. Note that it doesn't perform any other checks
// (like validity of the number of inputs).
func isSupported(spec *execinfrapb.ProcessorSpec) (bool, error) {

I like this, does this mean we can change our supported check (the one we have before sending out the flows for vectorization) to be more lightweight? Or would it not have much benefit given the interface you added to make constructions noops?


pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

		}

		c, err = wrapRowSources(

Should we have a heuristic that prohibits more than n wraps? I wonder what the threshold is where we'd just benefit from running the whole flow in the row execution engine.


pkg/sql/colexec/execplan.go, line 452 at r1 (raw file):

		// We say that the wrapped processor is "streaming" because it is not a
		// buffering operator (even if it is a buffering processor).

Could you just mention that memory used will still be accounted for?


pkg/sql/execinfra/stats.proto, line 19 at r2 (raw file):

// InputStats represents the stats collected from an input.
message InputStats {

Will the fact that these protos moved affect displaying these stats?


pkg/sql/logictest/testdata/logic_test/tpch_vec, line 781 at r2 (raw file):

│
└ Node 1
  └ *colexec.sortOp

Are we sure we want to implement the visibility into the non-vectorized tree by having processors implement the OpNode interface? It weird to have to implement a vectorized interface, and processors that haven't done so won't show up in explain. Is there a way to delegate explaining of a non-vectorized tree to another function? Or maybe have the node be an opaque "wrapped processor" (similar to LocalProcessor), but I'm not sure whether that's better. What you have might be the best solution right now.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)


pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

This dependency makes me slightly uncomfortable. Is there a way to avoid it?

I don't like it myself, but philosophically speaking, this dependency was already there the moment we decided to wrap any processor. Previously it was solved by having a few processor live in execinfra, but that's not right - the processors should live in rowexec, and this PR fixes that.

Now the question is how do we handle the mixed-flow? Ideally, the setup of the vectorized flow (which can include processors) would depend on rowexec, but colexec would only know about columnar execution.

While typing out this comment, I realized that it's a minor change, so I added another commit to do that.


pkg/sql/colexec/execplan.go, line 244 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I like this, does this mean we can change our supported check (the one we have before sending out the flows for vectorization) to be more lightweight? Or would it not have much benefit given the interface you added to make constructions noops?

I don't see much benefit in that direction - currently, we still want to fully instantiate the columnar operators because operators' constructors can return errors other than this isSupported check does.


pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Should we have a heuristic that prohibits more than n wraps? I wonder what the threshold is where we'd just benefit from running the whole flow in the row execution engine.

Good point, I was also thinking about this. I think that we mostly care about the pair of materializer - columnarizer that is added whenever the input to the wrapped processor is not a columnarizer already. We could in SupportsVectorized check simply traverse all the leaves (the execinfra.OpNodes) and see how many columnarizers we have in there relative to the total number of nodes. I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?

I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).


pkg/sql/colexec/execplan.go, line 452 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Could you just mention that memory used will still be accounted for?

Done.


pkg/sql/execinfra/stats.proto, line 19 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Will the fact that these protos moved affect displaying these stats?

I believe there will be no difference. Important point is that all protos are defined as if they are in package cockroach.sql.distsqlrun;.


pkg/sql/logictest/testdata/logic_test/tpch_vec, line 781 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Are we sure we want to implement the visibility into the non-vectorized tree by having processors implement the OpNode interface? It weird to have to implement a vectorized interface, and processors that haven't done so won't show up in explain. Is there a way to delegate explaining of a non-vectorized tree to another function? Or maybe have the node be an opaque "wrapped processor" (similar to LocalProcessor), but I'm not sure whether that's better. What you have might be the best solution right now.

Hm, I kinda like the way this output looks - it is clear that this vectorized flow is actually a "mixed" flow. I'd definitely want to see the names of the wrapped processors.

I agree that the fact that processors are implementing execinfra.OpNode interface is somewhat broken, but we already had that for indexJoiner and joinReader, and in this PR I've gone over all processors that implement execinfra.RowSource (necessary condition for being wrapped) and added the implementation. I think right now this is the least invasive change that does the job.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 36 of 36 files at r4, 32 of 32 files at r5, 10 of 10 files at r6, 14 of 14 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/colexec/execplan.go, line 24 at r1 (raw file):

Previously, yuzefovich wrote…

I don't like it myself, but philosophically speaking, this dependency was already there the moment we decided to wrap any processor. Previously it was solved by having a few processor live in execinfra, but that's not right - the processors should live in rowexec, and this PR fixes that.

Now the question is how do we handle the mixed-flow? Ideally, the setup of the vectorized flow (which can include processors) would depend on rowexec, but colexec would only know about columnar execution.

While typing out this comment, I realized that it's a minor change, so I added another commit to do that.

Awesome!


pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?
That SGTM. We might not want to get too clever with this, I think this decision should probably be pulled out into the optimizer at some point.

I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).
Not sure if we'd benefit much from that, what would that be for?


pkg/sql/colflow/colbatch_scan_test.go, line 82 at r7 (raw file):

				b.StopTimer()
				res, err := colexec.NewColOperator(
					ctx, &flowCtx, &spec, nil /* inputs */, testMemAcc,

With the addition of processorConstructor, I think we should change the arguments to be part of an arg struct to organize them better and make these calls less verbose in another PR

This commit refactors the setup of the vectorized flow so that all
processor cores that do not have an equivalent columnar operator are
wrapped with a materializer and a columnarizer. It extracts all the
checks for whether the core is "supported" into a new function, and if
the core is not supported, it'll get wrapped. All such wrapped
processors are marked as "streaming" because they are not buffering
operators (although they could be buffering processors), and as
a result, such wrapped processors will be planned when vectorize is set
to 'auto'.

The only processor cores that are not wrapped are LocalPlanNode (because
wrapping it seems to break some assumptions, and a flow with such
processor probably will not benefit from vectorization anyway) and
MetadataTest{Sender,Receiver} (because the vectorized flow propagates
the metadata in an unexpected way to these processors).

Release note: None
Previously IndexJoiner and JoinReader lived in execinfra because the
vectorized flow (colflow package) needed to instantiate them. However,
now that we're wrapping almost all unsupported processors into the
vectorized flow, we are instantiating them using NewProcessor method,
and all processors can be unexported. This commit also moves the related
things from execinfra to rowexec and unexports a bunch of stuff as well
(previously, it was exported because processors in rowexec and execinfra
needed access, but now everything is in rowexec).

Release note: None
Now that the processors can be wrapped into the vectorized flow, for
them to show up properly during EXPLAIN (VEC), they need to implement
execinfra.OpNode interface. This commit adds that.

Release note: None
Previously, there was a dependency on sql/rowexec package from
sql/colexec package because we wrapping processors that do not have an
equivalent columnar operator into the vectorized flow. This commit moves
that dependency into sql/colflow package. It is done by abstracting away
a function-constructor that lives in sql/execinfra.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)


pkg/sql/colexec/execplan.go, line 413 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I'd say that if > 20% of all nodes are columnarizers, then we will fallback to row-by-row engine with auto setting. Thoughts?
That SGTM. We might not want to get too clever with this, I think this decision should probably be pulled out into the optimizer at some point.

I also was thinking that we might wanna introduce a vectorize setting so that only columnar operators are planned with wrapping disallowed completely (even for index and lookup joins).
Not sure if we'd benefit much from that, what would that be for?

I'm not sure, but it seems that the wrapped processors could be the bottlenecks in the mixed flows, disabling the wrapping would be somewhat similar to experimental_always. In theory, we would also need to tweak the optimizer cost model so that the plans with columnar operators are favored, but this idea is definitely not fully fleshed out.


pkg/sql/colflow/colbatch_scan_test.go, line 82 at r7 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

With the addition of processorConstructor, I think we should change the arguments to be part of an arg struct to organize them better and make these calls less verbose in another PR

Yeah, I agree.

craig bot pushed a commit that referenced this pull request Dec 16, 2019
42873: colexec: wrap most unsupported processor cores into the vectorized flow r=yuzefovich a=yuzefovich

**colexec: wrap all unsupported processor cores into the vectorized flow**

This commit refactors the setup of the vectorized flow so that all
processor cores that do not have an equivalent columnar operator are
wrapped with a materializer and a columnarizer. It extracts all the
checks for whether the core is "supported" into a new function, and if
the core is not supported, it'll get wrapped. All such wrapped
processors are marked as "streaming" because they are not buffering
operators (although they could be buffering processors), and as
a result, such wrapped processors will be planned when vectorize is set
to 'auto'.

The only processor cores that are not wrapped are LocalPlanNode (because
wrapping it seems to break some assumptions, and a flow with such
processor probably will not benefit from vectorization anyway) and
MetadataTest{Sender,Receiver} (because the vectorized flow propagates
the metadata in an unexpected way to these processors).

Release note: None

**rowexec, execinfra: move all processors into rowexec**

Previously IndexJoiner and JoinReader lived in execinfra because the
vectorized flow (colflow package) needed to instantiate them. However,
now that we're wrapping almost all unsupported processors into the
vectorized flow, we are instantiating them using NewProcessor method,
and all processors can be unexported. This commit also moves the related
things from execinfra to rowexec and unexports a bunch of stuff as well
(previously, it was exported because processors in rowexec and execinfra
needed access, but now everything is in rowexec).

Release note: None

**rowexec: implement execinfra.OpNode interface by some processors**

Now that the processors can be wrapped into the vectorized flow, for
them to show up properly during EXPLAIN (VEC), they need to implement
execinfra.OpNode interface. This commit adds that.

Release note: None

**sql: move dependency on rowexec from colexec to colflow**

Previously, there was a dependency on sql/rowexec package from
sql/colexec package because we wrapping processors that do not have an
equivalent columnar operator into the vectorized flow. This commit moves
that dependency into sql/colflow package. It is done by abstracting away
a function-constructor that lives in sql/execinfra.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Dec 16, 2019

Build succeeded

@craig craig bot merged commit b5a91ee into cockroachdb:master Dec 16, 2019
@yuzefovich yuzefovich deleted the vec-wrap-all branch December 16, 2019 18:02
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Mar 17, 2020
Release justification: bug fixes and low-risk updates to new
functionality.

In cockroachdb#42873 we started wrapping most processors into the vectorized flow,
and most of the processors implemented execinfra.OpNode interface at
that time. However, we missed tableReader (I thought there would be no
vectorized flows with them, only with colBatchScans, and as it turns
out, it is now possible to have such a flow), and this commit fixes it.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Mar 17, 2020
Release justification: bug fixes and low-risk updates to new
functionality.

In cockroachdb#42873 we started wrapping most processors into the vectorized flow,
and most of the processors implemented execinfra.OpNode interface at
that time. However, we missed tableReader (I thought there would be no
vectorized flows with them, only with colBatchScans, and as it turns
out, it is now possible to have such a flow), and this commit fixes it.

Release note: None
craig bot pushed a commit that referenced this pull request Mar 17, 2020
46062: colexec: add disk monitor to disk queue r=Azhng a=Azhng

This commit plumbs down disk monitor from FlowCtx to disk queue. It will
be monitoring disk usage of vectorize engine similar to row engine.

Release justification: bug fixes and low-risk updates to new functionality
since it is necessary for the new disk spilling infrastructure to have
disk monitoring

Release note: None

Closes #45169

46200: rowexec: make tableReader implement execinfra.OpNode interface r=yuzefovich a=yuzefovich

Release justification: bug fixes and low-risk updates to new
functionality.

In #42873 we started wrapping most processors into the vectorized flow,
and most of the processors implemented execinfra.OpNode interface at
that time. However, we missed tableReader (I thought there would be no
vectorized flows with them, only with colBatchScans, and as it turns
out, it is now possible to have such a flow), and this commit fixes it.

Fixes: #46123.

Release note: None

Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants