Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colflow: introduce flow coordinator and simplify materializer #64697

Merged
merged 4 commits into from
May 15, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented May 5, 2021

execinfra: refactor ProcessorBase slightly

Previously, the output of a processor was embedded in
ProcOutputHelper. This commit moves it out into the ProcessorBase
because the follow-up commit will take advantage of such placement.

Release note: None

execinfra: rename a field in ProcOutputHelper

This commit renames ProcessorBase.Out to ProcessorBase.OutputHelper.
It also removes the large part of the comment on ProcessorBase since
it has become quite stale, and it is better to take a look at the
existing users of the struct as a guide (the actual code should be
up-to-date!).

Release note: None

execinfra: separate out ProcOutputHelper from ProcessorBase

In some cases, ProcOutputHelper that lives in the ProcessorBase is
not used by the caller, yet it is always allocated. This commit extracts
ProcessorBaseNoHelper that doesn't contain the helper (as well as some
other fields) and uses that in the materializer. This should reduce the
size of the materializers slightly.

This commit was prompted by the fact that the follow-up commit will
introduce another processor (the vectorized flow coordinator) that also
doesn't utilize the ProcOutputHelper.

Release note: None

colflow: introduce flow coordinator and simplify materializer

This commit extracts the logic of shutting down the vectorized flow out
of the materializer which simplifies the latter. This allows us to
optimize the case when the root of the whole plan is a wrapped
row-execution processor. Previously, in such a scenario we would plan
a columnarizer followed by a materializer because the latter was needed
in order to shut the flow down. This commit removes this redundant pair
of operators.

Informs: #50857.
Informs: #55758.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich changed the title colflow: introduce flow coordinator and clean up the materializer colflow: introduce flow coordinator and simplify materializer May 5, 2021
@yuzefovich yuzefovich requested a review from a team May 5, 2021 04:18
@yuzefovich yuzefovich marked this pull request as ready for review May 5, 2021 04:18
@yuzefovich yuzefovich requested review from a team and nihalpednekar and removed request for a team May 5, 2021 04:18
@yuzefovich
Copy link
Member Author

I still need to run the benchmarks, but I'm hopeful that this will be a slight improvement for write-heavy workloads (like KV0) given that we remove some redundant stuff.

@yuzefovich yuzefovich removed the request for review from nihalpednekar May 5, 2021 04:19
@yuzefovich yuzefovich force-pushed the flow-coordinator branch 4 times, most recently from bfce61e to 436de34 Compare May 5, 2021 05:29
@yuzefovich
Copy link
Member Author

Alright, I think I'm done force-pushing. The benchmarks are still missing, but otherwise it is RFAL.

@yuzefovich yuzefovich force-pushed the flow-coordinator branch 2 times, most recently from 41d6a7b to 0d41a6a Compare May 5, 2021 19:19
@yuzefovich
Copy link
Member Author

kv0 benchmarks don't show much difference (maybe I'm not pushing the benchmark hard enough):

  • GCE:
    • 5 min:
      300.0s 0 1032281 3440.9 2.3 2.1 4.5 6.3 65.0
      300.0s 0 1005184 3350.6 2.4 2.2 4.5 6.6 41.9
    • 15 min:
      900.0s 0 3415966 3795.5 2.1 1.8 4.1 6.3 201.3
      900.0s 0 3544314 3938.1 2.0 1.8 3.9 5.8 121.6
    • 30 min:
      1800.0s 0 7276065 4042.3 2.0 1.8 3.8 5.8 159.4
      1800.0s 0 7265501 4036.4 2.0 1.8 3.8 5.8 151.0
  • AWS:
    • 5 min:
      300.0s 0 1686894 5623.0 1.4 1.4 2.2 3.9 209.7
      300.0s 0 1701750 5672.5 1.4 1.4 2.1 3.8 209.7
    • 15 min:
      900.0s 0 5313930 5904.4 1.4 1.3 2.0 3.7 209.7
      900.0s 0 5291768 5879.7 1.4 1.4 2.0 3.7 33.6
    • 30 min:
      1800.0s 0 10571789 5873.2 1.4 1.4 2.0 3.5 33.6
      1800.0s 0 10654376 5919.1 1.3 1.3 2.0 3.5 209.7

In any case, I believe this to be a beneficial change overall (from the tech debt perspective).

@yuzefovich yuzefovich requested a review from michae2 May 10, 2021 16:59
@yuzefovich
Copy link
Member Author

Assigning @michae2 as the main reviewer, but it'd be great to get input from @jordanlewis too.

Previously, the output of a processor was embedded in
`ProcOutputHelper`. This commit moves it out into the `ProcessorBase`
because the follow-up commit will take advantage of such placement.

Release note: None
This commit renames `ProcessorBase.Out` to `ProcessorBase.OutputHelper`.
It also removes the large part of the comment on `ProcessorBase` since
it has become quite stale, and it is better to take a look at the
existing users of the struct as a guide (the actual code should be
up-to-date!).

Release note: None
In some cases, `ProcOutputHelper` that lives in the `ProcessorBase` is
not used by the caller, yet it is always allocated. This commit extracts
`ProcessorBaseNoHelper` that doesn't contain the helper (as well as some
other fields) and uses that in the materializers and columnarizers. This
should reduce the size of the materializers slightly.

This commit was prompted by the fact that the follow-up commit will
introduce another processor (the vectorized flow coordinator) that also
doesn't utilize the `ProcOutputHelper`.

Release note: None
Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r1, 19 of 19 files at r2, 16 of 16 files at r3, 15 of 18 files at r4, 41 of 41 files at r5, 19 of 19 files at r6, 17 of 17 files at r7, 18 of 18 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colflow/flow_coordinator.go, line 62 at r8 (raw file):

	cancelFlow func() context.CancelFunc,
) *FlowCoordinator {
	f := flowCoordinatorPool.Get().(*FlowCoordinator)

Where do we call flowCoordinatorPool.Put?


pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):

Quoted 5 lines of code…
	if err := colexecerror.CatchVectorizedRuntimeError(func() {
		f.input.Start(ctx)
	}); err != nil {
		f.MoveToDraining(err)
	}

Is this necessary? Won't there always be a Materializer somewhere beneath this in the stack, already catching these panics?


pkg/sql/colflow/flow_coordinator.go, line 152 at r8 (raw file):

Quoted 4 lines of code…
	if err := colexecerror.CatchVectorizedRuntimeError(f.nextAdapter); err != nil {
		f.MoveToDraining(err)
		return nil, f.DrainHelper()
	}

Same question: Won't there always be a Materializer somewhere beneath this catching these panics?

This commit extracts the logic of shutting down the vectorized flow out
of the materializer which simplifies the latter. This allows us to
optimize the case when the root of the whole plan is a wrapped
row-execution processor. Previously, in such a scenario we would plan
a columnarizer followed by a materializer because the latter was needed
in order to shut the flow down. This commit removes this redundant pair
of operators.

Release note: None
@yuzefovich
Copy link
Member Author

Found a bug with flowCoordinator.Release - never putting the object back into the pool. I will rerun the benchmarks hoping to see some performance improvements.

Thanks for the review!

Just noticed that you also spotted the bug with Release :) Will address your comments a bit later.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/colflow/flow_coordinator.go, line 62 at r8 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Where do we call flowCoordinatorPool.Put?

Nice catch! Fixed. (I also found it while working on #50857).


pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):

Previously, michae2 (Michael Erickson) wrote…
	if err := colexecerror.CatchVectorizedRuntimeError(func() {
		f.input.Start(ctx)
	}); err != nil {
		f.MoveToDraining(err)
	}

Is this necessary? Won't there always be a Materializer somewhere beneath this in the stack, already catching these panics?

No, there might not be - if we don't have any wrapped processors (i.e. the whole flow consists only of colexecop.Operators), then there won't be any materializers. We always have to put the catcher in the root components (the flow coordinator, the outbox, the hash router), but we also have the catcher in the parallel unordered sync because it spins up separate goroutines.


pkg/sql/colflow/flow_coordinator.go, line 152 at r8 (raw file):

Previously, michae2 (Michael Erickson) wrote…
	if err := colexecerror.CatchVectorizedRuntimeError(f.nextAdapter); err != nil {
		f.MoveToDraining(err)
		return nil, f.DrainHelper()
	}

Same question: Won't there always be a Materializer somewhere beneath this catching these panics?

Same as above.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, :lgtm:

Reviewed 1 of 1 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colflow/flow_coordinator.go, line 62 at r8 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Nice catch! Fixed. (I also found it while working on #50857).

OK. Great minds think alike!


pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

No, there might not be - if we don't have any wrapped processors (i.e. the whole flow consists only of colexecop.Operators), then there won't be any materializers. We always have to put the catcher in the root components (the flow coordinator, the outbox, the hash router), but we also have the catcher in the parallel unordered sync because it spins up separate goroutines.

Ah, I was only thinking of the flow on the gateway node (it always has a Materializer, right?). Not other nodes. Thank you!

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colflow/flow_coordinator.go, line 122 at r8 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Ah, I was only thinking of the flow on the gateway node (it always has a Materializer, right?). Not other nodes. Thank you!

Oh, sorry, what I said was partially wrong. (I'm currently working on a batch flow coordinator where we will not have a materializer in the scenario I described above.)

With this PR it is possible to have a wrapped row-execution processor as the root, and we add a flow coordinator on top of it. Consider a chain something like flow coordinator -> zigzag Joiner (zigzag joiner is a random processor that can read from two tables at once). In this case we will not have a materializer, so we have to have a panic catcher. However, in the case when we only have colexecop.Operators, then we will have a root materializer, so the panic catcher in the coordinator would be redundant in that case, but the cost of that extra panic catcher is negligent.

@yuzefovich
Copy link
Member Author

I'll go ahead and merge this, but I still want to run the benchmarks on this change.

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented May 15, 2021

Build succeeded:

@craig craig bot merged commit 09b51b5 into cockroachdb:master May 15, 2021
@yuzefovich yuzefovich deleted the flow-coordinator branch May 15, 2021 05:31
@yuzefovich
Copy link
Member Author

The new benchmarks also didn't show noticeable difference on kv0.

GCE:

tail -n 1 kv-old-5m.log
  300.0s        0        1163172         3877.2      2.1      1.8      3.9      5.5     41.9  
tail -n 1 kv-new-5m.log
  300.0s        0        1192014         3973.4      2.0      1.8      3.8      5.2     56.6  

tail -n 1 kv-old-5m.log
  300.0s        0        1149873         3832.9      2.1      1.9      3.9      5.5     56.6  
tail -n 1 kv-new-5m.log
  300.0s        0        1156698         3855.7      2.1      1.8      3.9      5.2     60.8
  
tail -n 1 kv-old-10m.log
  600.0s        0        2551546         4252.6      1.9      1.6      3.7      5.0     67.1  
tail -n 1 kv-new-10m.log
  600.0s        0        2618847         4364.7      1.8      1.6      3.7      5.0     71.3  

tail -n 1 kv-old-15m.log
  900.0s        0        4012546         4458.4      1.8      1.6      3.5      4.7     50.3  
tail -n 1 kv-new-15m.log
  900.0s        0        3890579         4322.9      1.8      1.6      3.7      5.0    151.0  

AWS:

tail -n 1 kv-old-5m.log
  300.0s        0        1101252         3670.8      2.2      2.0      4.1      5.8     58.7  
tail -n 1 kv-new-5m.log
  300.0s        0        1134329         3781.1      2.1      1.9      3.9      5.5     75.5  

tail -n 1 kv-old-5m.log
  300.0s        0        1139301         3797.7      2.1      1.9      3.9      5.5     75.5  
tail -n 1 kv-new-5m.log
  300.0s        0        1122278         3740.9      2.1      1.9      4.1      5.5     67.1  

tail -n 1 kv-old-10m.log
  600.0s        0        2552290         4253.8      1.9      1.6      3.7      5.0     60.8  
tail -n 1 kv-new-10m.log
  600.0s        0        2553480         4255.8      1.9      1.6      3.7      5.0     65.0  

tail -n 1 kv-old-15m.log
  900.0s        0        3891143         4323.5      1.8      1.6      3.7      5.0    318.8  
tail -n 1 kv-new-15m.log
  900.0s        0        3926556         4362.8      1.8      1.6      3.7      5.0    125.8  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants