[fix](fe) Deduplicate backends in ADMIN CLEAN TRASH command#63314
Closed
heguanhui wants to merge 1 commit into
Closed
[fix](fe) Deduplicate backends in ADMIN CLEAN TRASH command#63314heguanhui wants to merge 1 commit into
heguanhui wants to merge 1 commit into
Conversation
### What problem does this PR solve?
Issue Number: close #xxx
Problem Summary: The ADMIN CLEAN TRASH command could send duplicate clean trash tasks to the same BE. When backendsQuery contains duplicate entries (e.g., same host:port specified twice), or when different string representations resolve to the same backend, getNeedCleanedBackends() would add the same backend multiple times. Additionally, cleanTrash() did not deduplicate by backend ID before creating tasks, so duplicate backends in the list would result in multiple CleanTrashTask objects being sent to the same BE.
### Release note
Fixed a bug where ADMIN CLEAN TRASH could send duplicate clean tasks to the same backend when duplicate backend addresses were specified.
### Check List (For Author)
- Test: Unit Test
- AdminCleanTrashCommandTest verifies dedup with duplicate queries, distinct queries, and clean-all scenarios
- Behavior changed: No
- Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ADMIN CLEAN TRASHcommand where duplicate clean trash tasks could be sent to the same BESet<Long>dedup by backend ID in bothgetNeedCleanedBackends()andcleanTrash()methodsAdminCleanTrashCommandTestcovering duplicate queries, distinct queries, and clean-all scenariosWhat problem does this PR solve?
Issue Number: close #xxx
Problem Summary: The
ADMIN CLEAN TRASHcommand could send duplicate clean trash tasks to the same BE. WhenbackendsQuerycontains duplicate entries (e.g., samehost:portspecified twice), or when different string representations resolve to the same backend,getNeedCleanedBackends()would add the same backend multiple times. Additionally,cleanTrash()did not deduplicate by backend ID before creating tasks, so duplicate backends in the list would result in multipleCleanTrashTaskobjects being sent to the same BE.Root Cause
In
getNeedCleanedBackends(), thebackendsID.remove(backendQuery)only deduplicates by the string key format (host:port). If the user provides the same backend via different string representations or duplicate entries, the same backend could be added multiple times.In
cleanTrash(), there was no deduplication by backend ID. The method iterated over thebackendslist and created aCleanTrashTaskfor each entry without checking for duplicates.CleanTrashTaskhas signature-1and is NOT added toAgentTaskQueue(onlyAgentTaskExecutor.submit(batchTask)is called), so the queue dedup mechanism does not apply.Fix
Set<Long> addedIdsingetNeedCleanedBackends()to deduplicate by backend ID when processingbackendsQuerySet<Long> addedBackendIdsincleanTrash()as a safety net to deduplicate by backend ID before creating tasksRelease note
Fixed a bug where ADMIN CLEAN TRASH could send duplicate clean tasks to the same backend when duplicate backend addresses were specified.
Check List (For Author)
AdminCleanTrashCommandTestverifies dedup with duplicate queries, distinct queries, and clean-all scenarios