streamingccl: hookup tracing aggregtor events for the C2C job #108458

adityamaru · 2023-08-09T16:45:39Z

This change builds on top of #107994 and wires up each stream
ingestion data processor to emit TracingAggregatorEvents to
the frontier and subsequently the job coordinator.
These events are periodically flushed to files in the job_info
table and are consumable via the DBConsole Job Details page.

Currently, the only aggregator event that is propagated is the
IngestionPerformanceStats emitted by the sst batcher.

Informs: #108374
Fixes: #100126
Release note: None

cockroach-teamcity · 2023-08-09T16:45:56Z

This change is

adityamaru · 2023-08-09T16:46:51Z

First three commits are #107994 and #108359. Only the last commit is relevant.

stevendanna

This looks good, but it has produced another thought about some of the PRs it builds on. But once those PRs are in, I think this should be g2g.

pkg/ccl/streamingccl/streamingest/stream_ingestion_processor.go

stevendanna · 2023-08-16T12:35:24Z

pkg/sql/colexec/colbuilder/execplan.go

@@ -313,7 +314,9 @@ func canWrap(mode sessiondatapb.VectorizeExecMode, core *execinfrapb.ProcessorCo
 	case core.RestoreData != nil:
 	case core.Filterer != nil:
 	case core.StreamIngestionData != nil:
+		return errStreamIngestionWrap


Interesting. I read the linked issue and understand why we need this. I'm a little surprised we don't need this for the restore case though.

I am surprised too, let me dig some more.

I think this has something to do with the fact that in c2c the stream ingestion procs push the meta downstream to another processor in the frontier proc. In restore however the restore data processor is the leaf processor and pushes metas to the row result writer. I'll add some logging to sanity check that the restore data procs aren't doing any unwanted buffering of progress metas.

blathers-crl · 2023-08-24T23:48:55Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

This change builds on top of cockroachdb#107994 and wires up each stream ingestion data processor to emit TracingAggregatorEvents to the frontier and subsequently the job coordinator. These events are periodically flushed to files in the `job_info` table and are consumable via the DBConsole Job Details page. Currently, the only aggregator event that is propagated is the IngestionPerformanceStats emitted by the sst batcher. Fixes: cockroachdb#100126 Release note: None

adityamaru · 2023-08-28T13:33:18Z

bors r=stevendanna

craig · 2023-08-28T14:21:59Z

Build succeeded:

Bazel Essential CI (Cockroach)

The motivation for this change was to fix the leaked goroutine in cockroachdb#109658 but it led to a larger cleanup as explained below. In cockroachdb#108458 and other PRs we taught the processors of backup, restore and c2c to periodically send metas containing the TracingAggregatorEvents, for the coordinator to then process and persist for improved observability. In that logic we were defensive against a scenario in which the processor received a root context with no tracing span. In which case we never initialized the tracing aggregator on the processor. This should never be possible in production code, but we want to prevent failing the replication job if this were to happen. As part of this defense we also returned `nil` instead of the meta in `Next()`. What We didn't realize was that return a nil row and nil meta indicates that the consumer has been completely drained and causes the processor to pre-emptively drain and shutdown. With this change, if the root context does not have a tracing span and we are unable to init a tracing aggregator, we simply do not allow the timer controlling the return of the TracingAggergatorEvents to fire. By doing this, we avoid a situation where a nil row and nil meta are returned too early. As part of this cleanup this change also simplifies the tracing aggregator. The aggregator is no longer responsible for creating a child span and registering a listener. Instead it simply returns an object that can be registered as an event listener by the caller. The former was an approach that was necessary when we wanted to decorate the span with LazyTags, but nobody uses LazyTags and we have a much better workflow to consume aggregator events with the `Advanced Debugging` job details page. This simplification was motivated by the fact that `processorbase.StartInternal` now takes event listeners that can be registered with the tracing span of the context that hangs of the processor, and is accessible via the `processor.Ctx()` method. Fixes: cockroachdb#109658 Release note: None

109378: backupccl: avoid splitting if the split point might be unsafe r=lidorcarmel a=lidorcarmel Restore may use unsafe keys as split points, which may cause unsafe splits between column families, which may cause SQL to fail when reading the row, or worse, return wrong resutls. This commit avoids splitting on keys that might be unsafe. See the issue for more info. Epic: none Informs: #109483 Release note: None. 109631: sql: serialize under stress and unskip TestQueryCache r=cucaroach a=cucaroach This test would timeout under stress+race, this is because it exploits t.Parallel to run 15 tests concurrently that each fire up a TestServer. Fix the timeout by running the tests serially under stress. Fixes: #105174 Epic: none Release note: none 109683: spanconfig: skip protected timestamps on non-table data r=arulajmani a=aadityasondhi Previously, we were placing a protected timestamp using the `EverythingSpan` which covered the entire keyspace when targeting a cluster backup. This was non-ideal because not all are used for backup. This is especially problematic for high churn ranges, such as node liveness and timeseries, that can accumulate lots of MVCC garbage very quickly. Placing a protected timestamp on these ranges, thus preventing the MVCC GC to run, can cause badness. This patch introduces a new span that covers the keyspace excluded from backup. When we encounter a span that is within those bounds, we skip placing a protected timestamp on it. Fixes: #102338 Release note: None 109720: cluster-ui: break circular import of `rootActions` r=xinhaoz a=xinhaoz Importing `rootActions` from `reducers.ts` in cluster-ui was causing a circular import, preventing one of the redux fields experiencing the cyclic dependency from having their reducer populated, which omitted that field from the store altogether. Currently, this field is `uiConfig` which is affecting features that rely on checking the sql role of a user, such as displaying the `Reset SQL Stats` button. This commit extracts `rootActions` into its own file to remove the cyclic dependencies created. Fixes: #97996 Release note (bug fix): On CC, `Reset Sql Stats` button is now visible if the user has admin role. Using a production buiild 23.1 cluster-ui version: <img width="1909" alt="image" src="https://github.com/cockroachdb/cockroach/assets/20136951/348a941a-3be5-42ad-8b16-47cbf48f3f19"> 109734: bulk,ccl: refactor tracing aggregator integration r=stevendanna a=adityamaru The motivation for this change was to fix the leaked goroutine in #109658 but it led to a larger cleanup as explained below. In #108458 and other PRs we taught the processors of backup, restore and c2c to periodically send metas containing the TracingAggregatorEvents, for the coordinator to then process and persist for improved observability. In that logic we were defensive against a scenario in which the processor received a root context with no tracing span. In which case we never initialized the tracing aggregator on the processor. This should never be possible in production code, but we want to prevent failing the replication job if this were to happen. As part of this defense we also returned `nil` instead of the meta in `Next()`. What We didn't realize was that return a nil row and nil meta indicates that the consumer has been completely drained and causes the processor to pre-emptively drain and shutdown. With this change, if the root context does not have a tracing span and we are unable to init a tracing aggregator, we simply do not allow the timer controlling the return of the TracingAggergatorEvents to fire. By doing this, we avoid a situation where a nil row and nil meta are returned too early. As part of this cleanup this change also simplifies the tracing aggregator. The aggregator is no longer responsible for creating a child span and registering a listener. Instead it simply returns an object that can be registered as an event listener by the caller. The former was an approach that was necessary when we wanted to decorate the span with LazyTags, but nobody uses LazyTags and we have a much better workflow to consume aggregator events with the `Advanced Debugging` job details page. This simplification was motivated by the fact that `processorbase.StartInternal` now takes event listeners that can be registered with the tracing span of the context that hangs of the processor, and is accessible via the `processor.Ctx()` method. Fixes: #109658 Release note: None Co-authored-by: Lidor Carmel <lidor@cockroachlabs.com> Co-authored-by: Tommy Reilly <treilly@cockroachlabs.com> Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com> Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com> Co-authored-by: adityamaru <adityamaru@gmail.com>

adityamaru requested review from dt and stevendanna August 9, 2023 16:45

adityamaru requested review from a team as code owners August 9, 2023 16:45

adityamaru requested review from a team August 9, 2023 16:45

adityamaru requested a review from a team as a code owner August 9, 2023 16:45

adityamaru requested review from abarganier and yuzefovich and removed request for a team August 9, 2023 16:45

adityamaru removed request for a team, abarganier and yuzefovich August 9, 2023 16:46

stevendanna reviewed Aug 16, 2023

View reviewed changes

adityamaru force-pushed the hookup-stream-ingestion-aggregator branch from 6370e46 to 53f69c9 Compare August 24, 2023 23:48

adityamaru requested a review from a team as a code owner August 24, 2023 23:48

adityamaru requested review from michae2 and removed request for a team August 24, 2023 23:48

adityamaru force-pushed the hookup-stream-ingestion-aggregator branch from 53f69c9 to 21f4527 Compare August 25, 2023 13:35

stevendanna approved these changes Aug 25, 2023

View reviewed changes

adityamaru force-pushed the hookup-stream-ingestion-aggregator branch from 21f4527 to f0cac6f Compare August 25, 2023 15:27

adityamaru force-pushed the hookup-stream-ingestion-aggregator branch from f0cac6f to da18c77 Compare August 25, 2023 15:39

craig bot merged commit 9eb73a8 into cockroachdb:master Aug 28, 2023
7 of 8 checks passed

adityamaru mentioned this pull request Aug 30, 2023

bulk,ccl: refactor tracing aggregator integration #109734

Merged

adityamaru deleted the hookup-stream-ingestion-aggregator branch August 30, 2023 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamingccl: hookup tracing aggregtor events for the C2C job #108458

streamingccl: hookup tracing aggregtor events for the C2C job #108458

adityamaru commented Aug 9, 2023 •

edited

Loading

cockroach-teamcity commented Aug 9, 2023

adityamaru commented Aug 9, 2023

stevendanna left a comment

stevendanna Aug 16, 2023

adityamaru Aug 25, 2023

adityamaru Aug 25, 2023

blathers-crl bot commented Aug 24, 2023

adityamaru commented Aug 28, 2023

craig bot commented Aug 28, 2023

streamingccl: hookup tracing aggregtor events for the C2C job #108458

streamingccl: hookup tracing aggregtor events for the C2C job #108458

Conversation

adityamaru commented Aug 9, 2023 • edited Loading

cockroach-teamcity commented Aug 9, 2023

adityamaru commented Aug 9, 2023

stevendanna left a comment

Choose a reason for hiding this comment

stevendanna Aug 16, 2023

Choose a reason for hiding this comment

adityamaru Aug 25, 2023

Choose a reason for hiding this comment

adityamaru Aug 25, 2023

Choose a reason for hiding this comment

blathers-crl bot commented Aug 24, 2023

adityamaru commented Aug 28, 2023

craig bot commented Aug 28, 2023

adityamaru commented Aug 9, 2023 •

edited

Loading