Skip to content

sql: disable VC injection in TestTemporaryObjectCleaner#169914

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
DrewKimball:drewk/fix-temp-object-cleaner-flake-169663
May 8, 2026
Merged

sql: disable VC injection in TestTemporaryObjectCleaner#169914
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
DrewKimball:drewk/fix-temp-object-cleaner-flake-169663

Conversation

@DrewKimball
Copy link
Copy Markdown
Collaborator

The test wires a shared cleanup channel through the TempObjectsCleanupCh
testing knob and sends one trigger per node per cycle, expecting one
completion per node. With an auto-injected virtual cluster, both the
system tenant and the application layer run a TemporaryObjectCleaner
per pod and they share the same testing knobs — 6 cleaners listen on
a channel that only delivers 3 triggers per cycle. Worse, the
application-layer cleaners run with no coordination (only the system
tenant is gated on the meta1 leaseholder), and concurrent
application-layer cleaners can race on ListSessions when a pod's
SQL instance reader cache is briefly stale, deleting the still-active
session's temp schema.

The race in the application-layer cleanup path is the real bug
exposed by this test; it's tracked separately in #169912 (production
fix work). The test itself was written for the system-tenant
single-cleaner-per-cycle path, so disable VC injection here and let
#169912 drive the production fix.

Fixes #169663.

Epic: none

Release note: None

The test wires `TempObjectsCleanupCh` and `OnTempObjectsCleanupDone`
testing knobs into the cleaner, sends one trigger per node per cycle,
and expects to read one completion per node. With an auto-injected
virtual cluster, both the system tenant and the application layer run
a `TemporaryObjectCleaner` per pod and they share the same testing
knobs, so 6 cleaners listen on a channel that only delivers 3 triggers
per cycle. Worse, the application-layer cleaners run with no
coordination (only the system tenant is gated on the meta1
leaseholder), and concurrent application-layer cleaners can race on
`ListSessions` when a pod's SQL instance reader cache is briefly stale,
deleting the still-active session's temp schema.

The race in the application-layer cleanup path is the real bug
exposed by this test; it's tracked separately in cockroachdb#169912. The test
itself was written for the system-tenant single-cleaner-per-cycle
path, so disable VC injection here and let cockroachdb#169912 drive the
production fix.

Fixes cockroachdb#169663.

Epic: none

Release note: None
@DrewKimball DrewKimball requested a review from a team as a code owner May 7, 2026 16:46
@DrewKimball DrewKimball requested review from mw5h and removed request for a team May 7, 2026 16:46
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 7, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@mw5h reviewed all commit messages and made 1 comment.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained.

@DrewKimball
Copy link
Copy Markdown
Collaborator Author

TFTR

/trunk merge

@trunk-io trunk-io Bot merged commit 0c06c9d into cockroachdb:master May 8, 2026
25 checks passed
@DrewKimball DrewKimball deleted the drewk/fix-temp-object-cleaner-flake-169663 branch May 8, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: TestTemporaryObjectCleaner failed

3 participants