[Chore](pipeline) make wake up do not return error by BiteTheDDDDt · Pull Request #62190 · apache/doris

BiteTheDDDDt · 2026-04-08T02:02:28Z

This pull request refactors the PipelineTask::wake_up method to improve error handling and simplify its interface. The method now returns void instead of Status, and errors are handled internally by canceling the associated fragment if necessary. Corresponding updates are made throughout the codebase and tests to accommodate this change.

Core Refactoring and Error Handling:

Changed the signature of PipelineTask::wake_up from returning Status to void, and updated its implementation to handle errors by canceling the fragment context internally, rather than propagating Status up the call stack. [1] [2]
Updated the Dependency::set_ready method to call the new wake_up signature without error checking, since errors are now handled inside wake_up.

Test Adjustments:

Modified unit tests in pipeline_task_test.cpp to remove checks on the return value of wake_up and instead directly call the new void method. [1] [2]

Thearas · 2026-04-08T02:02:39Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

BiteTheDDDDt · 2026-04-08T02:02:39Z

run buildall

BiteTheDDDDt · 2026-04-08T02:02:46Z

/review

Copilot

Pull request overview

This PR hardens the pipeline dependency wake-up path by making Dependency::set_ready() non-throwing and adding exception handling around task wake-up to avoid leaving tasks stuck in BLOCKED state.

Changes:

Mark Dependency::set_ready() as noexcept.
Add try/catch around PipelineTask::wake_up() to cancel the query if wake-up fails for a non-finalized task.
Handle both Doris exceptions and standard/unexpected exceptions during wake-up.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
be/src/exec/pipeline/dependency.h	Marks `Dependency::set_ready()` as `noexcept`.
be/src/exec/pipeline/dependency.cpp	Wraps task wake-up logic with exception handling and best-effort recovery via query cancellation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions

Found 1 issue.

be/src/exec/pipeline/dependency.cpp: the new recovery path treats only FINALIZED tasks as benign, but PipelineTask::close() moves tasks to FINISHED before finalize() runs. A delayed dependency wake-up in that window can return Task state transition from FINISHED to RUNNABLE is not allowed, and this patch now escalates that benign late notification into frag->cancel(...), turning task completion races into query failures.

Critical checkpoint conclusions:

Goal / proof: The goal is to make Dependency::set_ready() non-throwing and avoid hangs. The current code does not fully accomplish that safely because it can incorrectly cancel a completing query. No new test covers the delayed wake-up / completion race.
Small / focused change: Yes, the patch is small and localized.
Concurrency: Applicable. The change is in a concurrent wake-up path; lock scope and ordering remain small, but the completion-state classification in the race window is incorrect.
Lifecycle / static init: No static-init concern. Task lifecycle is central here, and the review finding is in the FINISHED -> FINALIZED window.
Configuration: Not applicable.
Compatibility / storage format: Not applicable.
Parallel code paths: No additional analogous path requiring the same change was identified.
Special conditional checks: The new t->is_finalized() check is too narrow for the intended “already completed” case.
Test coverage: Missing targeted coverage for delayed wake-up and cancellation/completion races.
Observability: The added warning logs include the dependency name and status; observability is otherwise adequate.
Transaction / persistence: Not applicable.
Data writes / atomicity: Not applicable.
FE-BE variable passing: Not applicable.
Performance: No major performance issue identified in this patch.
Other issues: None beyond the false-positive cancellation above.

github-actions · 2026-04-08T02:08:35Z

        local_block_task.swap(_blocked_task);
    }
    for (auto task : local_block_task) {
-        if (auto t = task.lock()) {


t->is_finalized() is too narrow for the new "benign late wake-up" check. PipelineTask::close() sets the state to FINISHED before finalize() flips it to FINALIZED (pipeline_task.cpp), so a delayed dependency notification can legitimately hit a task that is already completed but not yet finalized. In that window wake_up() returns Task state transition from FINISHED to RUNNABLE is not allowed, and this code now cancels the whole fragment even though the task has already finished.

Please treat the pre-finalize FINISHED state as completed too, or key off the specific state-transition failure instead of only is_finalized().

BiteTheDDDDt · 2026-04-08T02:37:51Z

/review

BiteTheDDDDt · 2026-04-08T02:38:18Z

run buildall

github-actions

Found 1 issue that should be fixed before merge.

Critical checkpoints:

Goal / correctness: Not satisfied. The new noexcept path still allows blocked tasks to be stranded if an exception aborts the wake-up loop before fragment cancellation.
Minimality / focus: Yes. The change is small and localized to Dependency::set_ready().
Concurrency: Risk remains. _ready is published and _blocked_task is cleared before every waiter is safely handled, so an exception can break the task lifecycle.
Lifecycle / initialization: No new lifecycle or static initialization issues seen.
Configuration: N/A.
Compatibility: N/A.
Parallel paths: No additional matching path updates appear required.
Special checks: The new catch blocks do not guarantee cancellation or recovery on the exception path.
Test coverage: No test covers the wake_up exception path.
Observability: Non-OK status logging is useful, but the std::exception log says cancelling query even though this path does not cancel.
Transaction / persistence: N/A.
Data writes / modifications: N/A.
FE-BE variable passing: N/A.
Performance: No primary hot-path regression identified.
Other issues: None beyond the exception-handling hang risk above.

Overall opinion: not ready to merge as-is.

github-actions · 2026-04-08T02:40:38Z

+    } catch (const std::exception& e) {
+        // Non-Doris exceptions (e.g. std::bad_alloc from scheduler submit path).
+        LOG(WARNING) << "Dependency::set_ready(): unexpected exception during wake_up, "
+                        "cancelling query. dep="


Once _ready is set and _blocked_task is swapped out, any exception that escapes one iteration aborts the rest of local_block_task. The outer catch then only logs and returns. A concrete case is std::bad_alloc thrown from t->wake_up() or from the logging/cancel code in this loop: the current task is never resubmitted, the dependency is already marked ready, and no later set_ready() call will revisit the dropped waiters. That leaves the fragment hanging instead of cancelling it. The exception handling needs to be per task, with fragment cancellation or equivalent unblock logic before continuing to the next waiter.

BiteTheDDDDt · 2026-04-08T02:44:06Z

/review

BiteTheDDDDt · 2026-04-08T02:44:16Z

run buildall

github-actions

Findings:

be/src/exec/pipeline/dependency.cpp: the new wake_up() error path leaves the task in an inconsistent lifecycle state. PipelineTask::wake_up() clears _blocked_dep and transitions the task to RUNNABLE before calling submit(). If submit() returns non-OK here, set_ready() now only cancels the fragment and returns. At that point the task is no longer in _blocked_task, is no longer blocked on the dependency, and was never enqueued, so no worker thread will ever run close_task()/decrement_running_task() for it. The fragment cancel path only calls unblock_all_dependencies(); it does not schedule or close this stranded task. This is a real hang / leaked-running-task regression on the exact scheduler-failure path the old THROW_IF_ERROR propagated to the caller.
be/src/exec/pipeline/dependency.cpp: the new std::exception catch still aborts the BE process via CHECK(false). That contradicts the PR goal of handling std::bad_alloc and other standard exceptions gracefully. On the OOM/logging/cancel failure path described in the PR body, this change still turns the query-local failure into a process crash.

Critical checkpoint conclusions:

Goal of current task: make Dependency::set_ready() non-throwing and avoid tasks hanging when wake-up/recovery fails. Conclusion: not achieved; the new submit() failure handling can strand a task permanently, and the std::exception branch still aborts the process. No test in this PR demonstrates the new behavior.
Minimal / focused change: mostly focused to Dependency::set_ready(), but the behavioral change is not safely integrated with the surrounding task lifecycle.
Concurrency: applicable. Dependency::_blocked_task, PipelineTask::_blocked_dep, and task scheduler queues are part of the wake-up handshake. The new code keeps the lock scope lightweight, but it breaks the state-machine contract when wake_up() fails after mutating task state.
Lifecycle management: applicable. Task lifecycle is BLOCKED -> RUNNABLE -> scheduled/closed; the new path can stop after RUNNABLE without ever reaching scheduler ownership or close/finalize.
Configuration items: not applicable.
Compatibility changes: not applicable.
Parallel code paths: applicable. Other wake-up/unblock paths (PipelineFragmentContext::cancel(), Pipeline::make_all_runnable(), PipelineTask::unblock_all_dependencies()) still assume dependency wake-up either succeeds or propagates failure; they do not repair the stranded-task case introduced here.
Special conditional checks: applicable. The new if (!st.ok()) recovery branch needs stronger guarantees or compensation logic, which are currently missing.
Test coverage: insufficient. I did not find a regression test or unit test covering failed wake_up() / failed scheduler submission / exception during recovery.
Observability: partial. The warning log includes dependency/task/status, which is good, but observability does not compensate for the stranded-task bug or the process-abort path.
Transaction / persistence: not applicable.
Data writes / modifications: not applicable.
FE/BE variable passing: not applicable.
Performance: no material concern in the patch itself; correctness/lifecycle issues dominate.
Other issues: none beyond the two findings above.

Overall opinion: the PR should not be merged as-is because it introduces a task-lifecycle regression on wake-up failure and does not actually provide graceful handling for std::exception / OOM paths.

github-actions · 2026-04-08T02:49:27Z

+            if (auto t = task.lock()) {
+                std::unique_lock<std::mutex> lc(_task_lock);
+                auto st = t->wake_up(this, lc);
+                if (!st.ok()) {


PipelineTask::wake_up() is not an atomic "enqueue me" operation. It first clears _blocked_dep and transitions the task to RUNNABLE, and only then calls submit() (pipeline_task.cpp:1047-1055). If submit() returns non-OK here, this new branch only cancels the fragment and returns. The task has already been removed from _blocked_task, is no longer blocked on this dependency, and was never inserted into the scheduler queue, so no worker thread will ever call close_task() / decrement_running_task() for it. PipelineFragmentContext::cancel() only calls unblock_all_dependencies(), which does not repair that stranded-task state. This turns a scheduler failure into a potential permanent hang/leaked running-task count.

BiteTheDDDDt · 2026-04-08T02:54:05Z

run buildall

hello-stephen · 2026-04-08T04:12:22Z

BE UT Coverage Report

Increment line coverage 57.14% (16/28) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.98% (20105/37947)
Line Coverage	36.53% (188935/517218)
Region Coverage	32.80% (146674/447143)
Branch Coverage	33.93% (64215/189235)

hello-stephen · 2026-04-08T05:32:40Z

BE Regression && UT Coverage Report

Increment line coverage 64.29% (18/28) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.52% (27323/37162)
Line Coverage	57.14% (294623/515636)
Region Coverage	54.23% (244736/451276)
Branch Coverage	56.00% (106294/189817)

update update fmt update

BiteTheDDDDt · 2026-04-08T10:15:55Z

run buildall

BiteTheDDDDt · 2026-04-08T10:21:19Z

/review

github-actions

Found 1 issue.

be/src/exec/pipeline/pipeline_task.h: the wake_up() signature change was not propagated to every call site. be/test/exec/pipeline/pipeline_task_test.cpp:557 still calls task->wake_up(dep.get(), lc).ok(), so this PR does not compile as-is.

Critical checkpoint conclusions:

Goal of the task: The goal is to make Dependency::set_ready() stop propagating Status from wake_up(). The implementation is close, but the remaining stale test call means the change set does not currently build, so the goal is not fully accomplished.
Modification size/focus: Yes. The patch is small and focused on the wake-up/error-handling path.
Concurrency: I reviewed the Dependency::set_ready() -> PipelineTask::wake_up() path. It remains concurrency-sensitive, but I did not confirm a separate lock-order regression from this diff beyond the build break.
Special lifecycle/static initialization: No special lifecycle or static initialization issue is introduced in the reviewed code.
Configuration changes: None.
Compatibility/storage format changes: None.
Parallel code paths: The production caller in Dependency::set_ready() was updated, but the unit test still uses the old return contract, so not all relevant paths were updated.
Special conditional checks: The FINISHED/FINALIZED guard remains justified by the existing comment and matches the intended no-resubmit behavior.
Test coverage: Incomplete. The updated signature currently breaks an existing unit test call site, so there is no evidence of passing coverage for this refactor.
Observability: No additional logging or metrics appear necessary for this narrow refactor.
Transaction/persistence/data-write concerns: Not applicable.
FE-BE variable passing: Not applicable.
Performance: No obvious steady-state performance regression is visible in the reviewed diff.
Other issues: None beyond the compile break above.

github-actions · 2026-04-08T10:33:38Z

    }

-    Status wake_up(Dependency* dep, std::unique_lock<std::mutex>& /* dep_lock */);
+    void wake_up(Dependency* dep, std::unique_lock<std::mutex>& /* dep_lock */);


This signature change is not fully propagated yet. be/test/exec/pipeline/pipeline_task_test.cpp:557 still does EXPECT_TRUE(task->wake_up(dep.get(), lc).ok());, so the PR no longer compiles because wake_up() now returns void. Please update that remaining test call site together with the two lines already changed above.

BiteTheDDDDt · 2026-04-08T11:00:19Z

run buildall

hello-stephen · 2026-04-08T15:23:44Z

BE Regression && UT Coverage Report

Increment line coverage 60.61% (40/66) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.73% (27401/37166)
Line Coverage	57.34% (295690/515677)
Region Coverage	54.49% (245857/451230)
Branch Coverage	56.22% (106714/189828)

BiteTheDDDDt · 2026-04-09T02:23:26Z

run buildall

github-actions · 2026-04-09T03:22:30Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-04-09T03:22:32Z

PR approved by anyone and no changes requested.

hello-stephen · 2026-04-09T03:30:50Z

BE UT Coverage Report

Increment line coverage 66.67% (8/12) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.99% (20122/37971)
Line Coverage	36.55% (189162/517611)
Region Coverage	32.80% (146829/447607)
Branch Coverage	33.92% (64253/189415)

hello-stephen · 2026-04-09T04:55:35Z

BE Regression && UT Coverage Report

Increment line coverage 88.57% (62/70) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.68% (27400/37186)
Line Coverage	57.29% (295633/516029)
Region Coverage	54.44% (245939/451740)
Branch Coverage	56.15% (106691/189997)

Copilot AI review requested due to automatic review settings April 8, 2026 02:02

Copilot started reviewing on behalf of BiteTheDDDDt April 8, 2026 02:03 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread be/src/exec/pipeline/dependency.cpp Outdated

Comment thread be/src/exec/pipeline/dependency.cpp

github-actions bot reviewed Apr 8, 2026

View reviewed changes

BiteTheDDDDt changed the title ~~[Chore](pipeline) make Dependency::set_ready() noexcept~~ [Chore](pipeline) make wake up do not return error Apr 8, 2026

make Dependency::set_ready() noexcept

5b5094c

update update fmt update

BiteTheDDDDt force-pushed the dev_0408_2 branch from 319c31e to 5b5094c Compare April 8, 2026 10:10

github-actions bot reviewed Apr 8, 2026

View reviewed changes

fix

4385318

clean

fcc09b0

Gabriel39 approved these changes Apr 9, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 9, 2026

github-actions bot added the reviewed label Apr 9, 2026

Mryange approved these changes Apr 9, 2026

View reviewed changes

BiteTheDDDDt merged commit 443ae22 into apache:master Apr 9, 2026
29 of 31 checks passed

Conversation

BiteTheDDDDt commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thearas commented Apr 8, 2026

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

hello-stephen commented Apr 8, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 8, 2026

BE Regression && UT Coverage Report

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 8, 2026

Uh oh!

hello-stephen commented Apr 8, 2026

BE Regression && UT Coverage Report

Uh oh!

BiteTheDDDDt commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

hello-stephen commented Apr 9, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 9, 2026

BE Regression && UT Coverage Report

Uh oh!

Uh oh!

BiteTheDDDDt commented Apr 8, 2026 •

edited

Loading