Skip to content

Conversation

@BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Jan 28, 2026

What problem does this PR solve?

  1. The query_ctx might not be found in rerun_fragment, which could result in some fragments not being promptly notified for release.
  2. set _need_notify_close to false when cancel_query, make fragment do not waitting for wait_close

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings January 28, 2026 10:20
@Thearas
Copy link
Contributor

Thearas commented Jan 28, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves recursive CTE fragment rerun/release behavior by avoiding reliance on QueryContext lookup in FragmentMgr::rerun_fragment, reducing cases where fragments aren’t notified/released promptly.

Changes:

  • Update FragmentMgr::rerun_fragment to operate purely via the pipeline fragment context map rather than get_query_ctx().
  • Add SCOPED_ATTACH_TASK(_query_ctx.get()) inside several PipelineFragmentContext rerun-related methods to ensure proper task context attachment.
  • Adjust recursive CTE RPC rerun loop to continue across fragments and return a consolidated status.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
be/src/runtime/fragment_mgr.cpp Removes dependency on QueryContext lookup for rerun operations, using _pipeline_map directly.
be/src/pipeline/pipeline_fragment_context.cpp Attaches task context within wait/release/rebuild rerun operations to support the new caller behavior.
be/src/pipeline/exec/rec_cte_source_operator.h Changes rerun-fragment RPC error handling to continue across targets and return a final status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return fragment_ctx->set_to_rerun();
} else if (stage == PRerunFragmentParams::rebuild) {
return fragment_ctx->rebuild(_thread_pool.get());
} else if (stage == PRerunFragmentParams::submit) {
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rerun_fragment() no longer attaches a task context, and the submit stage now calls fragment_ctx->submit() without any SCOPED_ATTACH_TASK. Other stages (wait_close/set_to_rerun/rebuild) now attach inside the callee, but submit() does not, making thread context / signal task id / mem tracking inconsistent for the submit path. Consider attaching in this caller for the submit branch via SCOPED_ATTACH_TASK(fragment_ctx->get_query_ctx()) (or equivalent) before calling submit().

Suggested change
} else if (stage == PRerunFragmentParams::submit) {
} else if (stage == PRerunFragmentParams::submit) {
SCOPED_ATTACH_TASK(fragment_ctx->get_query_ctx());

Copilot uses AI. Check for mistakes.
print_id(query_id));
return Status::InvalidArgument("Unknown rerun fragment opcode: {}", stage);
}
return Status::OK();
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return Status::OK(); is unreachable because every branch above returns. Removing it would avoid dead code and make control flow clearer.

Copilot uses AI. Check for mistakes.
Comment on lines 197 to 201
st = Status::InternalError(controller.ErrorText());
}

auto rpc_st = Status::create(result.status());
if (!rpc_st.ok()) {
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_send_rerun_fragments() now continues on RPC failures but overwrites st on each error, so the final returned status depends on the last failing fragment and may lose the original/root-cause error (and which fragment failed). Also, when controller.Failed() is true, result.status() is not meaningful; consider skipping the Status::create(result.status()) path for that iteration (e.g., continue) and preserving the first failure (or aggregating errors with fragment id/address) instead of overwriting.

Suggested change
st = Status::InternalError(controller.ErrorText());
}
auto rpc_st = Status::create(result.status());
if (!rpc_st.ok()) {
if (st.ok()) {
st = Status::InternalError(controller.ErrorText());
}
// When the RPC controller reports failure, the result status is not reliable.
continue;
}
auto rpc_st = Status::create(result.status());
if (!rpc_st.ok() && st.ok()) {

Copilot uses AI. Check for mistakes.
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (21/28) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.45% (25583/35804)
Line Coverage 54.08% (267372/494424)
Region Coverage 51.71% (222682/430664)
Branch Coverage 53.11% (95635/180057)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (21/28) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.45% (25583/35804)
Line Coverage 54.08% (267377/494424)
Region Coverage 51.72% (222730/430664)
Branch Coverage 53.12% (95652/180057)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants