[fix](be) Fix time-sharing executor queued task count#63568
Merged
Conversation
Keep the executor queued-task metric consistent when queued splits are removed before execution. The fix routes queue offer/remove operations through helpers that update _total_queued_tasks together with the split queue and token state. Add a regression test that enqueues splits with no worker threads, removes their tasks, and verifies the queue count returns to zero so later submissions are not rejected as full.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 31309 ms |
Contributor
TPC-DS: Total hot run time: 171614 ms |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
Contributor
Author
|
Can it be patched to 4.1.0? |
Contributor
|
@Jungzhang I will be released to 4.1.2. And I think you can patch the PR to 4.1.0 in your own env and test it. |
yiguolei
approved these changes
May 26, 2026
Contributor
|
/review |
ghost
reviewed
May 26, 2026
Contributor
ghost
left a comment
There was a problem hiding this comment.
Review result: no blocking issue found in the changed code.
Critical checkpoint conclusions:
- Goal/test: The PR fixes stale queued-task accounting when queued splits are removed and when auto-rescheduled splits are re-offered. The added unit test covers the remove_task capacity/count regression.
- Scope: The change is small and focused on centralizing queue offer/remove accounting.
- Concurrency: The modified counters and token state transitions are performed under _lock; task maps/sets remain under _mutex. I did not find a new lock-order or missed-notification issue in the changed paths.
- Lifecycle/static init: No new static initialization or non-obvious ownership lifecycle risk was introduced.
- Configuration/compatibility: No new config, protocol, storage, or compatibility surface.
- Parallel paths: The changed direct queue offers/removals in submit, auto-reschedule, task removal, and destructor are covered by the helper paths. Stop already resets the queue count after clearing.
- Tests: A targeted BE unit test was added for queued count cleanup after remove_task. I did not run the full BE unit suite in this review environment.
- Observability/performance: Existing metrics now receive a more accurate queued count; no obvious added hot-path overhead beyond constant-time accounting and existing queue-size scans on removal.
- User focus: No additional user-provided review focus was present.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: N/A
Related PR: N/A
Problem Summary:
TimeSharingTaskExecutoruses_total_queued_tasksfor queue-size metrics and capacity checks. When queued splits were removed before execution, for example when a task was cancelled or removed, the split queue removed those splits but_total_queued_taskswas not decremented.After repeated removals,
_total_queued_taskscould become larger than the real queue size. This made the executor report a non-zero queue size even when there were no active or queued splits, and later submissions could be rejected as if the queue were full.This PR keeps queue offer/remove operations consistent by updating
_total_queued_taskstogether with the split queue and token state.Release note
Fix a bug where the time-sharing scan executor queue size could become inaccurate after queued splits were removed before execution.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)