sql: populate query-level stats earlier & add contention to telemetry log #84718

THardy98 · 2022-07-20T15:37:08Z

Addresses: #71328

This change adds contention time (measured in nanoseconds) to the
SampledQuery telemetry log.

To accomodate this change, we needed to collect query-level statistics
earlier. Previously, query-level statistics were fetched when we called
Finish under the instrumentationHelper, however this occurred after
we had already emitted our query execution logs. Now, we collect
query-level stats in dispatchToExecutionEngine after we've executed
the query.

As a tradeoff to collecting query-level stats earlier, we need to fetch
the trace twice:

once when populating query-level stats (trace is required to compute
query-level stats) at populateQueryLevelStats in
dispatchToExecutionEngine after query execution
once during the instrumentation helper's Finish (as we do currently)

This allows us to collect query-level stats earlier without omitting any
tracing events we record currently. This approach is safer, with the
additional overhead of fetching the trace twice only occuring at the
tracing sampling rate of 1-2%, which is fairly conservative. The concern
with only fetching the trace at query-level stats population was the
ommission of a number of events that occur in
commitSQLTransactionInternal (or any execution paths that don't lead
to dispatchToExecutionEngine).

Release note (sql change): Add ContentionTime field to the
SampledQuery telemetry log. Query-level statistics are collected
earlier to facilitate the adding of contention time to query execution
logs. The earlier collection of query-level statistics requires the
additional overhead of fetching the query's trace twice (instead of
previously once).

cockroach-teamcity · 2022-07-20T15:37:16Z

This change is

yuzefovich

The connExecutor changes

Reviewed 7 of 8 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @THardy98)

pkg/sql/instrumentation.go line 285 at r2 (raw file):

	}

	if collectExecStats || ih.implicitTxn {

nit: previously this block would only execute if there was no error returned by GetQueryLevelStats, maybe we should preserve that behavior?

THardy98

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)

pkg/sql/instrumentation.go line 285 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: previously this block would only execute if there was no error returned by GetQueryLevelStats, maybe we should preserve that behavior?

This and RecordStatementExecStats previously did not run if an error was encountered during GetQueryLevelStats.

Changed queryLevelStats to queryLevelStatsWithErr in instrumentationHelper to keep track of whether we encountered an error during GetQueryLevelStats. We check the Err value of this, and if Err is nil (no error has occurred), then we accumulate transaction stats and record statement execution stats.

yuzefovich

Reviewed 5 of 5 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @THardy98)

maryliag

Reviewed 3 of 8 files at r1, 1 of 1 files at r2, 2 of 5 files at r3, 2 of 3 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @THardy98 and @yuzefovich)

-- commits line 16 at r4:
what type of events? anything we're collecting today that we will lose?

pkg/sql/conn_executor_exec.go line 1208 at r4 (raw file):

}

func populateQueryLevelStats(ctx context.Context, p *planner) {

is there any tests that you can add to make sure the new flow didn't change the final behaviour?

THardy98

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yahor and @yuzefovich)

-- commits line 16 at r4:

Previously, maryliag (Marylia Gutierrez) wrote…

what type of events? anything we're collecting today that we will lose?

In the case where we could be missing events, these would be events that we record after query execution. From chatting with @yahor, my understanding is that our trace provides the vast majority of its value from recording events prior to, and during query execution. If we are missing post-query execution events, they are likely inconsequential.

In the case that they aren't, and we really do need them, we could fetch the trace again later and ensure we have all events. The tradeoff here is that fetching the trace is somewhat expensive, but considering we only trace a very small percentage of queries (under a configured sampling rate of something like 1 or 2%), this shouldn't be much of an issue.

pkg/sql/conn_executor_exec.go line 1208 at r4 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

is there any tests that you can add to make sure the new flow didn't change the final behaviour?

From chatting with @yuzefovich the testing for tracing is somewhat ad-hoc, but there do seem to exist tests that ensure we're populating events that we are expecting, the primary one I believe being TestTrace.

Additionally there do seem to be failing tests in CI that seem to be related, which is a good thing.

THardy98 · 2022-07-30T18:13:25Z

@ajwerner @postamar I'm noticing CI failures with rtt analysis benchmarking tests for schema change queries. The changes in this PR are recording 5 fewer round trips than expected in the tests. I'm not familiar enough with tracing or schema queries to know why this would occur, or if this is necessarily a bad thing.

It looked like you both seem to have written the majority of these tests so I thought I'd ask to hear your thoughts on these failures/changes.

ajwerner · 2022-07-31T15:23:14Z

The changes in this PR are recording 5 fewer round trips than expected in the tests. I'm not familiar enough with tracing or schema queries to know why this would occur, or if this is necessarily a bad thing.

I can't figure out just from looking at this PR why it would reduce the round-trips. Reducing round-trips is a good thing. You can rewrite the expectations using the --rewrite flag, though beware, it can be slow to rewrite them all.

yuzefovich

I can't figure out just from looking at this PR why it would reduce the round-trips. Reducing round-trips is a good thing.

This change might have an effect that some events are no longer included in the trace, so I don't think that we get actual RTT reduction here - instead, the trace might no longer capture some of the stuff (like say in some defer we're releasing some descriptors / leases which do some RTTs, and we no longer capture them in the trace).

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag, @yahor, and @yuzefovich)

THardy98 · 2022-08-02T16:14:13Z

I can't figure out just from looking at this PR why it would reduce the round-trips. Reducing round-trips is a good thing.

This change might have an effect that some events are no longer included in the trace, so I don't think that we get actual RTT reduction here - instead, the trace might no longer capture some of the stuff (like say in some defer we're releasing some descriptors / leases which do some RTTs, and we no longer capture them in the trace).

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag, @yahor, and @yuzefovich)

WIth this in mind @ajwerner, would you still recommend simply --rewrite the expected # of RTT for the corresponding failing benchmark tests?

THardy98 · 2022-08-04T13:26:44Z

After chatting with @ajwerner, we've decided to opt for fetching the trace twice, both when we:

populate query-level stats immediately after query execution
and when we call the instrumentation helper's Finish

to preserve all tracing events. The concern with only fetching the trace at query-level stats population was missing a number of events that occur in commitSQLTransactionInternal (or any execution paths that don't lead to dispatchToExecutionEngine, like this switch case). This approach is safer, with the additional overhead of fetching the trace twice only occuring at the tracing sampling rate of 1-2%, which is fairly conservative.
Will update the PR description/commit message accordingly.

THardy98

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag, @yahor, and @yuzefovich)

-- commits line 16 at r4:

Previously, THardy98 (Thomas Hardy) wrote…

In the case where we could be missing events, these would be events that we record after query execution. From chatting with @yahor, my understanding is that our trace provides the vast majority of its value from recording events prior to, and during query execution. If we are missing post-query execution events, they are likely inconsequential.

In the case that they aren't, and we really do need them, we could fetch the trace again later and ensure we have all events. The tradeoff here is that fetching the trace is somewhat expensive, but considering we only trace a very small percentage of queries (under a configured sampling rate of something like 1 or 2%), this shouldn't be much of an issue.

This is no longer the case now that we fetch the trace twice. No events would be omitted.

pkg/sql/conn_executor_exec.go line 1208 at r4 (raw file):

Previously, THardy98 (Thomas Hardy) wrote…

From chatting with @yuzefovich the testing for tracing is somewhat ad-hoc, but there do seem to exist tests that ensure we're populating events that we are expecting, the primary one I believe being TestTrace.

Additionally there do seem to be failing tests in CI that seem to be related, which is a good thing.

On the tracing side, now that we fetch the trace twice, we will not be omitting any events, so no changes there.

On the query-level stats side, there are existing tests as part of TestSampledStatsCollection that ensure we get the statistic values we're expecting for sql stats.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag, @THardy98, @yahor, and @yuzefovich)

pkg/sql/conn_executor_exec.go line 1216 at r8 (raw file):

}

func populateQueryLevelStats(ctx context.Context, p *planner) {

this function needs a comment

pkg/sql/conn_executor_exec.go line 1218 at r8 (raw file):

func populateQueryLevelStats(ctx context.Context, p *planner) {
	ih := &p.instrumentation
	if ih.sp == nil {

there's precedent for this test written like this ingetNodesFromPlanner, but let's introduce a method on ih like Tracing() (sp, ok bool) or such, and let's also put a comment on if.sp about when it's set.

pkg/sql/conn_executor_exec.go line 1226 at r8 (raw file):

		flowsMetadata = append(flowsMetadata, flowInfo.flowsMetadata)
	}
	trace := ih.sp.GetConfiguredRecording()

does this ever need verbose recording? If not, do GetRecording(Structured).

pkg/sql/conn_executor_exec.go line 1228 at r8 (raw file):

	trace := ih.sp.GetConfiguredRecording()
	var err error
	queryLevelStats, err := execstats.GetQueryLevelStats(trace, p.execCfg.TestingKnobs.DeterministicExplain, flowsMetadata)

nit: line too long

pkg/sql/conn_executor_exec.go line 1228 at r8 (raw file):

	trace := ih.sp.GetConfiguredRecording()
	var err error
	queryLevelStats, err := execstats.GetQueryLevelStats(trace, p.execCfg.TestingKnobs.DeterministicExplain, flowsMetadata)

GetQueryLevelStats seems to make assumptions about what it finds in the trace. Do those assumptions hold when the ih did not create the span? I think they do, because the span in question is still limited to the current statement (but please check). But this all seems very shaky; if it wasn't for that span creation for the statement, it would break (I think).

I think we should either a) have the ih always create a span, or b) never create a span - and rely and assert that there is a statement span.
I think a) is easier but b) is nicer.

For b), I think we need to:

add the WithForceRealSpan option to the statement span creation, in order to assure that the span is created even when trace.span_registry.enabled cluster setting is tuned off (the default is on).
in the ih, assert that we're operating on a statement span, by checking the span's operation name
change the tracing library so that children of a "real span" are not automatically forced to be real spans themselves. The relevant check is the opts.parentTraceID at span creation time. I was surprised that the check is there; I don't think it's intentional. I don't think that the parent being a real span is reason enough for the child to be a real span. Valid reasons for the child being a real span include the active spans registry being enabled enabled (which is the default), or that the parent recording at the time the child is being created.
Without 3), bullet number 1) would effectively neuter the trace.span_registry.enabled = off setting because, by creating a real span at such a high level, we'd essentially be creating spans everywhere.

WDYT?

pkg/sql/execstats/traceanalyzer.go line 138 at r8 (raw file):

// ConstructQueryLevelStatsWithErr creates a QueryLevelStatsWithErr from a
// QueryLevelStats and error.
func ConstructQueryLevelStatsWithErr(stats QueryLevelStats, err error) QueryLevelStatsWithErr {

MakeQueryLevelStatsWithErr

pkg/util/log/eventpb/telemetry.proto line 94 at r8 (raw file):

  // The duration of time in nanoseconds that the query experienced contention.
  int64 contention_time = 22 [(gogoproto.jsontag) = ',omitempty'];

contention_nanos

THardy98

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @maryliag, @THardy98, @yahor, and @yuzefovich)

pkg/sql/conn_executor_exec.go line 1216 at r8 (raw file):