Skip to content

sql: add always on query-level SQL CPU tracking#169586

Draft
DrewKimball wants to merge 6 commits intocockroachdb:masterfrom
DrewKimball:drewk/grunning-at-send
Draft

sql: add always on query-level SQL CPU tracking#169586
DrewKimball wants to merge 6 commits intocockroachdb:masterfrom
DrewKimball:drewk/grunning-at-send

Conversation

@DrewKimball
Copy link
Copy Markdown
Collaborator

sql: add grunning-based local KV CPU tracking to fetcher infrastructure

Redesign CPUStopWatch as a stack-allocated value type where Start()
records the grunning timestamp and Stop() returns the elapsed CPU
duration. Callers atomically accumulate the delta themselves. All
methods check grunning.Supported internally, so callers need not guard.

Use the new CPUStopWatch to move KV CPU measurement from the caller
side (wrapping each NextKV call) down into the KV send layer (wrapping
txn.Send calls). This captures the local goroutine CPU overhead of KV
operations more accurately.

Key changes:

  • Simplify CPUStopWatch from a heap-allocated, mutex-protected type to
    a stack-friendly value type. The old private cpuStopWatch (used by
    StopWatch) is moved to stopwatch.go.
  • Add localKVCPUTime *int64 plumbing through KVBatchFetcher, KVFetcher,
    and Fetcher, with atomic accumulation in the send closures.
  • The streaming KV fetcher passes nil since it dispatches KV requests on
    async goroutines where per-goroutine grunning is not meaningful.
  • Remove the old per-operator CPUStopWatch that wrapped NextKV calls and
    reserve the old KVCPUTime field in ComponentStats.
  • Add LocalKVCPUTime to the Metrics proto for metadata-based propagation.

Informs #168272
Release note: None

sql: propagate local KV CPU from mutation and vecindex operators

Add CPUStopWatch instrumentation to mutation operators (INSERT, UPDATE,
DELETE, UPSERT and their variants) and vecindex store transactions.

For mutations going through tableWriterBase, the CPUStopWatch wraps
Run() and CommitInBatch() calls. For deleteRangeNode and
insertFastPathNode, which have their own KV send paths, the
CPUStopWatch is embedded directly. For vecindex, the CPUStopWatch
wraps kv.Run() calls in runAndRecordStats().

All mutation nodes expose localKVCPUTime() via the mutationPlanNode
interface, which feeds into the metadata propagation chain.

Informs #168272
Release note: None

sql: propagate LocalKVCPUTime from all KV operators for query-level CPU

Add a LocalKVCPUTime field to the Metrics proto message to carry the
grunning-based local KV CPU time from operators to the gateway. Report
this metric from all KV-reading operators: ColBatchScan,
ColBatchDirectScan, ColIndexJoin, tableReader, joinReader,
invertedJoiner, zigzagJoiner, and wrapped mutationPlanNodes.

Accumulate LocalKVCPUTime in topLevelQueryStats via the DistSQLReceiver
pushMeta path, and forward it through forwardInnerQueryStats for
cascades and triggers.

Informs #168272
Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

sql: add always-on per-query SQL CPU measurement

Introduce unconditional grunning-based CPU capture on both gateway and
remote flows, and aggregate the measurements at the gateway to produce
per-query SQL CPU time.

On the gateway, goroutine CPU time is sampled at statement start and
again at completion. Remote flows capture grunning at flow start and
emit the delta as Metrics.SQLCPUTime trailing metadata when the last
outbox finishes. The gateway aggregates these into sqlCPUTimeNanos
after subtracting local KV CPU time (which overlaps with grunning).

Key changes:

  • Gateway captures grunning.Time() at statement start unconditionally
    (gated on grunning.Supported, not IncludeRUEstimateInExplainAnalyze).
  • Remote flows capture grunning.Time() at flow start unconditionally.
  • Both row-engine and vectorized outboxes emit per-flow grunning via
    Metrics.SQLCPUTime when the last outbox on a remote node finishes.
  • Vectorized outbox flow-level stats (MaxMemUsage, MaxDiskUsage,
    ConsumedRU) are moved from getStats to sendDrainedMetadata for
    symmetry with the row-engine outbox.
  • EXPLAIN ANALYZE unconditionally displays SQL CPU time, removing the
    previous vectorized-only and non-mutation restrictions.

The CPUUsageHelper (process-wide CPU estimator) remains alive and
continues to handle RU estimation for EXPLAIN ANALYZE; it will be
replaced in a subsequent commit.

Fixes #168272
Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

sql: replace CPUUsageHelper with always-on SQL CPU for RU estimation

Now that per-query SQL CPU time is always available via grunning-based
measurement, remove the old process-wide CPU estimator (CPUUsageHelper)
and use the per-query measurement for RU estimation in EXPLAIN ANALYZE.

Key changes:

  • Delete CPUUsageHelper type from multitenantcpu package.
  • Remove cpuStatsCollector from connExecutor and TenantCPUMonitor from
    FlowCtx.
  • Remove TenantCPUMonitor.StartCollection from remote flow startup.
  • Remove ConsumedRU emission from both row-engine and vectorized
    outboxes (no longer needed since RU is computed at the gateway from
    per-query SQL CPU).
  • Remove FlowStats.ConsumedRU accumulation in trace analyzer.
  • Switch RU estimation in populateQueryLevelStats from
    cpuStats.EndCollection to costCfg.PodCPUCost(sqlCPUTimeNanos).
  • Rewrite TestEstimateQueryRUConsumption to validate per-query RU
    estimates against expected values with tolerance.

Informs #168272
Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

sql: record SQL CPU time in statement and transaction statistics

Wire the always-on sqlCPUTimeNanos measurement into the statement and
transaction stats recording pipeline.

  • Add sql_cpu_time_nanos field to StatementStatistics and
    TransactionStatistics protos.
  • Accumulate per-statement SQL CPU time at the transaction level.
  • Record in RecordStatement and RecordTransaction.
  • Add JSON serialization support.

Informs #168272
Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

DrewKimball and others added 6 commits May 1, 2026 15:07
Redesign CPUStopWatch as a stack-allocated value type where Start()
records the grunning timestamp and Stop() returns the elapsed CPU
duration. Callers atomically accumulate the delta themselves. All
methods check grunning.Supported internally, so callers need not guard.

Use the new CPUStopWatch to move KV CPU measurement from the caller
side (wrapping each NextKV call) down into the KV send layer (wrapping
txn.Send calls). This captures the local goroutine CPU overhead of KV
operations more accurately.

Key changes:
- Simplify CPUStopWatch from a heap-allocated, mutex-protected type to
  a stack-friendly value type. The old private cpuStopWatch (used by
  StopWatch) is moved to stopwatch.go.
- Add `localKVCPUTime *int64` plumbing through KVBatchFetcher, KVFetcher,
  and Fetcher, with atomic accumulation in the send closures.
- The streaming KV fetcher passes nil since it dispatches KV requests on
  async goroutines where per-goroutine grunning is not meaningful.
- Remove the old per-operator CPUStopWatch that wrapped NextKV calls and
  reserve the old KVCPUTime field in ComponentStats.
- Add LocalKVCPUTime to the Metrics proto for metadata-based propagation.

Informs cockroachdb#168272
Release note: None
Add CPUStopWatch instrumentation to mutation operators (INSERT, UPDATE,
DELETE, UPSERT and their variants) and vecindex store transactions.

For mutations going through tableWriterBase, the CPUStopWatch wraps
Run() and CommitInBatch() calls. For deleteRangeNode and
insertFastPathNode, which have their own KV send paths, the
CPUStopWatch is embedded directly. For vecindex, the CPUStopWatch
wraps kv.Run() calls in runAndRecordStats().

All mutation nodes expose localKVCPUTime() via the mutationPlanNode
interface, which feeds into the metadata propagation chain.

Informs cockroachdb#168272
Release note: None
Add a LocalKVCPUTime field to the Metrics proto message to carry the
grunning-based local KV CPU time from operators to the gateway. Report
this metric from all KV-reading operators: ColBatchScan,
ColBatchDirectScan, ColIndexJoin, tableReader, joinReader,
invertedJoiner, zigzagJoiner, and wrapped mutationPlanNodes.

Accumulate LocalKVCPUTime in topLevelQueryStats via the DistSQLReceiver
pushMeta path, and forward it through forwardInnerQueryStats for
cascades and triggers.

Informs cockroachdb#168272
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Introduce unconditional grunning-based CPU capture on both gateway and
remote flows, and aggregate the measurements at the gateway to produce
per-query SQL CPU time.

On the gateway, goroutine CPU time is sampled at statement start and
again at completion. Remote flows capture grunning at flow start and
emit the delta as Metrics.SQLCPUTime trailing metadata when the last
outbox finishes. The gateway aggregates these into sqlCPUTimeNanos
after subtracting local KV CPU time (which overlaps with grunning).

Key changes:
- Gateway captures grunning.Time() at statement start unconditionally
  (gated on grunning.Supported, not IncludeRUEstimateInExplainAnalyze).
- Remote flows capture grunning.Time() at flow start unconditionally.
- Both row-engine and vectorized outboxes emit per-flow grunning via
  Metrics.SQLCPUTime when the last outbox on a remote node finishes.
- Vectorized outbox flow-level stats (MaxMemUsage, MaxDiskUsage,
  ConsumedRU) are moved from getStats to sendDrainedMetadata for
  symmetry with the row-engine outbox.
- EXPLAIN ANALYZE unconditionally displays SQL CPU time, removing the
  previous vectorized-only and non-mutation restrictions.

The CPUUsageHelper (process-wide CPU estimator) remains alive and
continues to handle RU estimation for EXPLAIN ANALYZE; it will be
replaced in a subsequent commit.

Fixes cockroachdb#168272
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Now that per-query SQL CPU time is always available via grunning-based
measurement, remove the old process-wide CPU estimator (CPUUsageHelper)
and use the per-query measurement for RU estimation in EXPLAIN ANALYZE.

Key changes:
- Delete CPUUsageHelper type from multitenantcpu package.
- Remove cpuStatsCollector from connExecutor and TenantCPUMonitor from
  FlowCtx.
- Remove TenantCPUMonitor.StartCollection from remote flow startup.
- Remove ConsumedRU emission from both row-engine and vectorized
  outboxes (no longer needed since RU is computed at the gateway from
  per-query SQL CPU).
- Remove FlowStats.ConsumedRU accumulation in trace analyzer.
- Switch RU estimation in populateQueryLevelStats from
  cpuStats.EndCollection to costCfg.PodCPUCost(sqlCPUTimeNanos).
- Rewrite TestEstimateQueryRUConsumption to validate per-query RU
  estimates against expected values with tolerance.

Informs cockroachdb#168272
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Wire the always-on sqlCPUTimeNanos measurement into the statement and
transaction stats recording pipeline.

- Add sql_cpu_time_nanos field to StatementStatistics and
  TransactionStatistics protos.
- Accumulate per-statement SQL CPU time at the transaction level.
- Record in RecordStatement and RecordTransaction.
- Add JSON serialization support.

Informs cockroachdb#168272
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 1, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 1, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@DrewKimball
Copy link
Copy Markdown
Collaborator Author

First PR is here: #169587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: collect SQL CPU Time on all query executions as a top-level query stat

2 participants