colexec: add streaming metadata propagation #65586

yuzefovich · 2021-05-22T02:39:57Z

colmeta: introduce new package to support streaming meta propagation

This commit introduces a couple of interfaces into colexecop package
(currently not used) that will be implemented by the root components of
vectorized flows (both flow coordinators and outboxes) in order to
propagate the metadata in a streaming fashion.

It also introduces a new package colmeta that contains a utility
component implementing the logic of intertwining pieces of data (like
coldata.Batches and rowenc.EncDatumRows) with the requests to
propagate metadata in a streaming fashion.

The utility handler is designed to be used as follows:

there is a separate goroutine (DataProducer) reading from the input
to the root component and pushing pieces of data onto a channel in
synchronous manner. This goroutine is blocked in order to not request
more data from the input until necessary.
there is an arbitrary number of goroutines
(StreamingMetadataProducers) that want to propagate the metadata in
a streaming fashion.
there is a main goroutine (DataConsumer) of the root component
responsible for pushing the data intertwined with streaming meta to the
output of the root component.

Release note: None

colflow: utilize streaming metadata handler in root components

This commit refactors flow coordinators and outboxes to use colmeta
package in order to support propagating the metadata in a streaming
fashion. This required introduction of another goroutine in all root
components (so now the outbox needs 2 goroutines in addition to the one
in which it is running).

Notably, a refactor of the row flow coordinator was needed to no longer
rely on execinfra.ProcessorBase utilities because that struct assumes
that all methods are called from a single goroutine, and this commit was
breaking that assumption. As a result, flowCoordinatorBase was
extracted to contain the common logic of both row and batch flow
coordinators (making the former no longer use the ProcessorBase).

Currently, only ColBatchScans and Inboxes propagate some metadata in
a streaming fashion (the former sends scan progress metadata, so we now
have the same query progress reporting in the vectorized engine as we do
in the row engine).

Fixes: #55758.

Release note (sql change): Queries executed via the vectorized engine
now display their progress in phase column of SHOW QUERIES.
Previously, this feature was only available in the row-by-row engine.

cockroach-teamcity · 2021-05-22T02:40:10Z

This change is

yuzefovich · 2021-06-10T18:26:49Z

I think this is RFAL.

I'm also very open to suggestions on the naming here (of the new package, interfaces, handler, etc), so please let me know if you have any ideas.

jordanlewis

I had a look, nice work on this.

I am concerned about the complexity of the careful dance of locks and channels that must be performed. Is there some way that this code can be simplified? We know that we've seen some trouble here before and I'm not looking forward to the day that we have to debug a stuck goroutine on one of the channels...

From a high level, instead of adding a 3rd goroutine that has to be carefully synchronized with producerBlock, is there some way that we could conceivably have the main goroutine alternate from Batch/row to incoming streamed metadata itself? Or is there some issue with that which might cause a deadlock or something?

I'm also wondering, is there a performance cost to this change? We'll now have to do several(?) more channel sends per row (in the row engine) and per batch (in vectorized). Isn't it correct that every row/batch causes a synchronization on the producerBlock channel?

I'll take another look soon. It takes some time to wrap my head around the logic.

Reviewed 7 of 7 files at r1, 1 of 1 files at r2, 5 of 5 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @michae2, and @yuzefovich)

pkg/sql/colfetcher/colbatch_scan.go, line 133 at r4 (raw file):

		meta.Metrics = execinfrapb.GetMetricsMeta()
		meta.Metrics.RowsRead = sinceLastUpdate
		if err := s.streamingMetaReceiver.PushStreamingMeta(s.Ctx, meta); err != nil {

What if the error was a context cancellation or another event that we would rather intercept and shut down because of? Is it safe to swallow this error?

pkg/sql/colflow/flow_coordinator.go, line 150 at r4 (raw file):

	// communicating all the data to the consumer goroutine (the current one).
	// TODO(yuzefovich): consider using stopper to run this goroutine.
	go func(flowCtx context.Context) {

Why are we not using the stopper here?

pkg/sql/colflow/flow_coordinator.go, line 154 at r4 (raw file):

		defer f.producer.ProducerDone()
		if err := f.producer.WaitForConsumer(flowCtx); err != nil {
			return

Probably we should log this error, rather than swallow it completely.

pkg/sql/colflow/flow_coordinator.go, line 317 at r4 (raw file):

		meta.Err = err
		exit = f.producer.SendMeta(ctx, meta) != nil
		return

nit: using naked returns and named return values is generally considered an antipattern for readability, I would prefer to always see return drain, exit or just return false, f.producer.SendMeta(ctx, meta) != nil

pkg/sql/colflow/colmeta/streaming_meta.go, line 23 at r3 (raw file):

)

// The outline of how the interfaces in this file are designed to be used.

Nice documentation!

I think it would be helpful to include one more section before the diagram: what is the high level summary of the colmeta package?

pkg/sql/colflow/colmeta/streaming_meta.go, line 48 at r4 (raw file):

	// the consumer arrives. A context cancellation error can be returned in
	// which case the producer should exit right away.
	WaitForConsumer(context.Context) error

I'm not sure this is idiomatic - it seems like a lot of interfaces prefer to give you a channel to wait on, like ctx.Done() does.

pkg/sql/execinfra/scanbase.go, line 22 at r4 (raw file):

// ScanProgressFrequency determines how often the scan operators should emit
// the metadata about how many rows they have read.

Is this in rows? Maybe add a comment?

yuzefovich

I am concerned about the complexity of the careful dance of locks and channels that must be performed. Is there some way that this code can be simplified?

I share your concern, but I couldn't think of anything simpler... I'm keeping my fingers crossed so that

the day that we have to debug a stuck goroutine on one of the channels...

never comes because we're careful to always listen on the context when performing sends/recv's on the channels. There are a couple of exceptions to this strategy:

we don't listen for context cancellation from the DataProducer goroutine when sending on producerBlock channel because that channel is buffered, and we won't try to send twice or more times before that channel is received from
we don't listen for context cancellation from the DataProducer goroutine when recv'ing from nextCh channel because we rely on the DataConsumer goroutine to notice the context cancellation and close that channel properly.

From a high level, instead of adding a 3rd goroutine that has to be carefully synchronized with producerBlock, is there some way that we could conceivably have the main goroutine alternate from Batch/row to incoming streamed metadata itself? Or is there some issue with that which might cause a deadlock or something?

What exactly do you have in mind?

My original idea from a year ago was to have the root component periodically non-blockingly poll something (like separate channel for streaming metadata) every time before the component pushes row/batch to its output. It would be something like

func (f *BatchFlowCoordinator) Run() {
  for {
    nextBatch := input.Next()
    // Push all streaming meta we have accumulated so far.
  LOOP:
    for {
      select {
      case meta <- f.streamingMeta:
        f.output.PushBatch(nil, meta)
      default:
        break LOOP
      }
    }
    f.output.Pushbatch(nextBatch, nil)
  }
}

However, I realized that such a strategy won't work in case the flow doesn't produce any data (e.g. automatic stats collection doesn't push anything to the flow coordinator AFAIU), so I abandoned this idea.

I'm also wondering, is there a performance cost to this change? We'll now have to do several(?) more channel sends per row (in the row engine) and per batch (in vectorized).

Yeah, I expect this change to have some performance cost, and I haven't run any benchmarks (but will kick them off tomorrow). Note that the row-by-row engine is not affected by this change at all, but in the vectorized engine - yes, we have an extra channel send per row (if a processor is at the root) or per batch (if an operator is at the root).

Isn't it correct that every row/batch causes a synchronization on the producerBlock channel?

Yes, that's correct, but in my mind it is the desired behavior given that we don't want to allow for the DataProducer goroutine to proceed processing more data, in the general case, until the DataConsumer tells it to.

However, now that I'm thinking about this, this might be a poor choice. The original case I was concerned about was if the query had a limit, I didn't want for the DataProducer to produce more than necessary, but the tree of operators is aware of the limit itself (unlike the DistSQLReceiver), so now I think it makes sense to let the DataProducer run eagerly. What do you think?

We will need to add some allocations (of nextChMsgs), but we can sync.Pool them since we have very clear lifetimes.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @michae2)

pkg/sql/colfetcher/colbatch_scan.go, line 133 at r4 (raw file):