Skip to content

[fix](fe) Deduplicate backends in ADMIN CLEAN TRASH command#63314

Closed
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:fix/admin-clean-trash-dedup
Closed

[fix](fe) Deduplicate backends in ADMIN CLEAN TRASH command#63314
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:fix/admin-clean-trash-dedup

Conversation

@heguanhui
Copy link
Copy Markdown
Contributor

Summary

  • Fix deduplication bug in ADMIN CLEAN TRASH command where duplicate clean trash tasks could be sent to the same BE
  • Add Set<Long> dedup by backend ID in both getNeedCleanedBackends() and cleanTrash() methods
  • Add unit test AdminCleanTrashCommandTest covering duplicate queries, distinct queries, and clean-all scenarios

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: The ADMIN CLEAN TRASH command could send duplicate clean trash tasks to the same BE. When backendsQuery contains duplicate entries (e.g., same host:port specified twice), or when different string representations resolve to the same backend, getNeedCleanedBackends() would add the same backend multiple times. Additionally, cleanTrash() did not deduplicate by backend ID before creating tasks, so duplicate backends in the list would result in multiple CleanTrashTask objects being sent to the same BE.

Root Cause

  1. In getNeedCleanedBackends(), the backendsID.remove(backendQuery) only deduplicates by the string key format (host:port). If the user provides the same backend via different string representations or duplicate entries, the same backend could be added multiple times.

  2. In cleanTrash(), there was no deduplication by backend ID. The method iterated over the backends list and created a CleanTrashTask for each entry without checking for duplicates.

  3. CleanTrashTask has signature -1 and is NOT added to AgentTaskQueue (only AgentTaskExecutor.submit(batchTask) is called), so the queue dedup mechanism does not apply.

Fix

  • Add Set<Long> addedIds in getNeedCleanedBackends() to deduplicate by backend ID when processing backendsQuery
  • Add Set<Long> addedBackendIds in cleanTrash() as a safety net to deduplicate by backend ID before creating tasks

Release note

Fixed a bug where ADMIN CLEAN TRASH could send duplicate clean tasks to the same backend when duplicate backend addresses were specified.

Check List (For Author)

  • Test: Unit Test
    • AdminCleanTrashCommandTest verifies dedup with duplicate queries, distinct queries, and clean-all scenarios
  • Behavior changed: No
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: The ADMIN CLEAN TRASH command could send duplicate clean trash tasks to the same BE. When backendsQuery contains duplicate entries (e.g., same host:port specified twice), or when different string representations resolve to the same backend, getNeedCleanedBackends() would add the same backend multiple times. Additionally, cleanTrash() did not deduplicate by backend ID before creating tasks, so duplicate backends in the list would result in multiple CleanTrashTask objects being sent to the same BE.

### Release note

Fixed a bug where ADMIN CLEAN TRASH could send duplicate clean tasks to the same backend when duplicate backend addresses were specified.

### Check List (For Author)

- Test: Unit Test
    - AdminCleanTrashCommandTest verifies dedup with duplicate queries, distinct queries, and clean-all scenarios
- Behavior changed: No
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@heguanhui heguanhui closed this May 16, 2026
@heguanhui heguanhui deleted the fix/admin-clean-trash-dedup branch May 16, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants