fix(compactor): Clean up stale bucket index metrics on ownership change#7487
Open
yeya24 wants to merge 1 commit intocortexproject:masterfrom
Open
fix(compactor): Clean up stale bucket index metrics on ownership change#7487yeya24 wants to merge 1 commit intocortexproject:masterfrom
yeya24 wants to merge 1 commit intocortexproject:masterfrom
Conversation
When a tenant's compactor ownership changes due to ring rebalancing, the old compactor's cortex_bucket_index_last_successful_update_timestamp_seconds metric was not being cleaned up. This caused the metric to have duplicate series (one stale from the old owner, one fresh from the new owner), triggering false alarms on bucket index update rate. The fix ensures scanUsers() properly detects tenants that were previously owned but are no longer in the current owned set, and cleans up their metrics. Also extracts metric deletion into a reusable deleteUserMetrics() helper to reduce code duplication. Signed-off-by: Ben Ye <benye@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a tenant's compactor ownership changes due to ring rebalancing, the old compactor's cortex_bucket_index_last_successful_update_timestamp_seconds metric was not being cleaned up. This caused the metric to have duplicate series (one stale from the old owner, one fresh from the new owner), triggering false alarms on bucket index update rate.
The fix ensures scanUsers() properly detects tenants that were previously owned but are no longer in the current owned set, and cleans up their metrics. Also extracts metric deletion into a reusable deleteUserMetrics() helper to reduce code duplication.
What this PR does:
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]docs/configuration/v1-guarantees.mdupdated if this PR introduces experimental flags