Skip to content

admission: garbage collect resource manager groups less frequently#170630

Merged
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
stevendanna:ssd/ctt-metric-gc
May 28, 2026
Merged

admission: garbage collect resource manager groups less frequently#170630
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
stevendanna:ssd/ctt-metric-gc

Conversation

@stevendanna

Copy link
Copy Markdown
Collaborator

gcGroupsResetUsedAndUpdateEstimators previously dropped any non-built-in groupInfo after one second of inactivity. That could cause groupInfoPool churn for tenants whose work arrived in bursts, and — more importantly — left holes in per-tenant metric series: releaseGroupInfo Unlinks the group's per-group AggCounter children, so at typical 10–30s scrape intervals their labeled time series could disappear before ever being observed.

This change decouples GC from the used reset. Each groupInfo now carries an idleSince that the GC loop refreshes on activity. A group is dropped only once (now - idleSince) >= groupGCIdleThreshold, currently one minute. Built-ins remain exempt. The fair-share used reset still happens every tick.

TestGCKeepsMetricChildVisibleAcrossScrapes exercises the scenario directly: admits work for a tenant, simulates several sub-threshold GC ticks while scraping the per-group parent, and confirms the labeled child stays visible until the threshold elapses.

Epic: none

@stevendanna stevendanna requested a review from a team as a code owner May 20, 2026 11:05
@trunk-io

trunk-io Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

😎 Merged successfully - details.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@stevendanna stevendanna requested a review from tbg May 22, 2026 11:20
gcGroupsResetUsedAndUpdateEstimators previously dropped any
non-built-in groupInfo after one second of inactivity. That could
cause groupInfoPool churn for tenants whose work arrived in
bursts. It also left holes in per-tenant metric series:
releaseGroupInfo Unlinks the group's per-group AggCounter
children, so at typical 10-30s scrape intervals their labeled
time series could disappear before ever being observed.

Here, we decouple GC from the used reset. Each groupInfo now
carries an idleSince that the GC loop refreshes on activity. A
group is dropped only once (now - idleSince) >=
groupGCIdleThreshold, currently one minute. Built-ins remain
exempt. The fair-share used reset still happens every tick.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@stevendanna

Copy link
Copy Markdown
Collaborator Author

/trunk merge

@trunk-io trunk-io Bot merged commit f2dc3af into cockroachdb:master May 28, 2026
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants