server: don't gate failed txn insights on contention resolution#169295
server: don't gate failed txn insights on contention resolution#169295dhartunian wants to merge 1 commit intocockroachdb:masterfrom
Conversation
When listing execution insights, transactions with HighContention as a cause were deferred to contention event validation — the insight was only shown once the blocking transaction's fingerprint ID was resolved. This filtering applied unconditionally, even when the insight was independently valid due to FailedExecution. A transaction that fails on COMMIT with a 40001 serialization conflict has both FailedExecution (problem) and HighContention (cause). Under race, contention resolution can be delayed long enough to exceed the SucceedsSoon timeout, causing the insight to never appear in crdb_internal.node_txn_execution_insights. Skip contention validation for insights that have FailedExecution, since they are valid regardless of contention resolution state. Fixes: cockroachdb#169115 Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link) |
kyle-a-wong
left a comment
There was a problem hiding this comment.
Is this something we are adding just for the sake of making the test pass, or do we actually want insights to not wait for the contention event when it also contains a failed execution insight? The reason this code was originally added was because sometimes users wouldn't see the corresponding contention events when a high contention insight is created, and this sort of re-introduces that issue.
Do you think we should just skip this specific test under race / deadlock instead?
True, but i think a Note that a |
When listing execution insights via
crdb_internal.node_txn_execution_insights,localExecutionInsightsdefers all insights withHighContentionas a causeto contention event validation — the insight is only returned once the blocking
transaction's fingerprint ID has been resolved. This filtering applied
unconditionally, even when the insight was independently valid due to
FailedExecution.A transaction that fails on COMMIT with a 40001 serialization conflict gets
both
FailedExecution(problem) andHighContention(cause). Under race,contention resolution can be slow enough to exceed the SucceedsSoon timeout
(3m45s), causing the insight to never appear.
The fix adds a
!FailedExecutionguard to the contention validation path sothat failed transaction insights are always returned immediately, regardless
of contention resolution state.
Fixes: #169115
Epic: none
Release note: None