Skip to content

server: don't gate failed txn insights on contention resolution#169295

Open
dhartunian wants to merge 1 commit intocockroachdb:masterfrom
dhartunian:davidh/fix-failed-insight-contention-filter
Open

server: don't gate failed txn insights on contention resolution#169295
dhartunian wants to merge 1 commit intocockroachdb:masterfrom
dhartunian:davidh/fix-failed-insight-contention-filter

Conversation

@dhartunian
Copy link
Copy Markdown
Collaborator

When listing execution insights via crdb_internal.node_txn_execution_insights,
localExecutionInsights defers all insights with HighContention as a cause
to contention event validation — the insight is only returned once the blocking
transaction's fingerprint ID has been resolved. This filtering applied
unconditionally, even when the insight was independently valid due to
FailedExecution.

A transaction that fails on COMMIT with a 40001 serialization conflict gets
both FailedExecution (problem) and HighContention (cause). Under race,
contention resolution can be slow enough to exceed the SucceedsSoon timeout
(3m45s), causing the insight to never appear.

The fix adds a !FailedExecution guard to the contention validation path so
that failed transaction insights are always returned immediately, regardless
of contention resolution state.

Fixes: #169115
Epic: none
Release note: None

When listing execution insights, transactions with HighContention as
a cause were deferred to contention event validation — the insight
was only shown once the blocking transaction's fingerprint ID was
resolved. This filtering applied unconditionally, even when the
insight was independently valid due to FailedExecution.

A transaction that fails on COMMIT with a 40001 serialization conflict
has both FailedExecution (problem) and HighContention (cause). Under
race, contention resolution can be delayed long enough to exceed the
SucceedsSoon timeout, causing the insight to never appear in
crdb_internal.node_txn_execution_insights.

Skip contention validation for insights that have FailedExecution,
since they are valid regardless of contention resolution state.

Fixes: cockroachdb#169115
Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@dhartunian dhartunian requested a review from a team as a code owner April 28, 2026 22:15
@dhartunian dhartunian requested review from jasonlmfong and removed request for a team April 28, 2026 22:15
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 28, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 28, 2026

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@dhartunian dhartunian requested a review from kyle-a-wong April 28, 2026 22:19
@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 28, 2026

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

Copy link
Copy Markdown
Contributor

@kyle-a-wong kyle-a-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we are adding just for the sake of making the test pass, or do we actually want insights to not wait for the contention event when it also contains a failed execution insight? The reason this code was originally added was because sometimes users wouldn't see the corresponding contention events when a high contention insight is created, and this sort of re-introduces that issue.

Do you think we should just skip this specific test under race / deadlock instead?

@alyshanjahani-crl
Copy link
Copy Markdown
Collaborator

The reason this code was originally added was because sometimes users wouldn't see the corresponding contention events when a high contention insight is created, and this sort of re-introduces that issue.

True, but i think a FailedExecution is important to show regardless of if we have the associated contention events resolved yet or not - the user should still see the txn failing and why. The fact that it also experienced high contention is kind of auxiliary at that point.

Note that a 40001 SERIALIZATION_CONFLICT doesn't make the insight HighContention - that failure mode can happen independently - so i do think this PR is a good behaviour change - now the user will see the FailedInsightDetailsPanel displaying the transaction info from the SERIALIZATION_CONFLICT contention event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pkg/sql/sqlstats/insights/integration/integration_test: TestTransactionInsightsFailOnCommit failed

4 participants