Skip to content

[fix](pipeline) avoid data queue sink dependency lost wakeup#63055

Merged
Mryange merged 1 commit into
apache:masterfrom
Mryange:fix-data-queue-sink-dep
May 7, 2026
Merged

[fix](pipeline) avoid data queue sink dependency lost wakeup#63055
Mryange merged 1 commit into
apache:masterfrom
Mryange:fix-data-queue-sink-dep

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 7, 2026

What problem does this PR solve?

Issue Number: N/A

Problem Summary:
DataQueueTest.MultiTest could intermittently hang after DataQueue moved sink dependency notifications outside the per-sub-queue lock. Root cause: SubQueue queue state and sink_dependency state were no longer serialized by queue_lock, so a producer could observe its sink dependency as blocked even after the queue had already become empty, leaving no future push/pop to wake it. This patch updates sink_dependency->set_ready() and sink_dependency->block() while holding queue_lock, keeping queue occupancy and sink readiness transitions atomic with respect to each other.

Related PR: #62947

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 7, 2026

run buildall

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 7, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request changes.

The PR changes only DataQueue locking/notification behavior, and it regresses the previous lock-order separation around Dependency notifications.

Critical checkpoint conclusions:

  • Goal/test: the change appears intended to simplify queue operations, but no targeted concurrency test demonstrates that dependency notifications remain safe.
  • Scope: small and focused, but it changes concurrency semantics in a hot pipeline path.
  • Concurrency: applicable. Source/sink pipeline tasks share SubQueue::queue_lock and Dependency state. Calling Dependency::set_ready()/block() while holding queue_lock creates nested lock ordering and should be kept outside queue_lock.
  • Lifecycle/static initialization: not applicable.
  • Configuration/compatibility/protocol/persistence/data writes: not applicable.
  • Parallel paths: clear_blocks() and set_source_block() still deliberately notify outside their queue/source locks; try_pop()/try_push() should follow the same pattern.
  • Error handling/memory/observability/performance: no additional issues found beyond the lock-order/contention regression.
  • Tests: no new test coverage for this concurrency behavior was found.
  • User focus: no additional user-provided review focus.

Comment thread be/src/exec/operator/data_queue.cpp
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 7, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed PR 63055 against c5883bb4ae11c7b477052d6332cc9bb3da76a40d..9b9fa843898f6387160bf4392a45cc9e5b294da5.

No new distinct blocking issue found beyond the already-open inline thread on SubQueue::try_pop()/try_push() dependency notification while holding queue_lock. I did not duplicate that existing thread.

Critical checkpoint conclusions:

  • Goal and tests: The PR aims to avoid a lost wakeup in DataQueue sink dependency state by serializing queue occupancy and sink readiness transitions. The modified path is covered by existing DataQueueTest.MultiTest, but the PR body does not show a completed test run.
  • Scope: The change is small and focused on be/src/exec/operator/data_queue.cpp.
  • Concurrency: The modified code is concurrency-sensitive between sink producer pipeline tasks and source consumer pipeline tasks. Per-subqueue state remains guarded by queue_lock; source dependency transitions remain guarded by _source_lock. The existing review thread already covers the main concern about invoking Dependency::{set_ready,block} under queue_lock.
  • Lifecycle/static initialization: No new static/global lifecycle issue found.
  • Configuration: No configuration item added.
  • Compatibility/storage/transactions: No serialization, storage-format, FE-BE protocol, or transaction persistence change found.
  • Parallel code paths: The affected DataQueue paths are used by union/cache shared-state operators; no separate parallel implementation requiring the same local edit was found.
  • Special condition checks: No new special defensive condition identified.
  • Test result files: No regression .out files changed.
  • Observability: No new observability needed for this narrowly scoped scheduling fix.
  • Performance: No new clear performance issue found aside from the lock/notification concern already being discussed.

User focus: .opencode-review.6swglQ/review_focus.txt says there is no additional user-provided review focus.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 7, 2026

run nonConcurrent

@Mryange Mryange merged commit 17bbba4 into apache:master May 7, 2026
33 of 34 checks passed
Mryange added a commit to Mryange/doris that referenced this pull request May 15, 2026
…63055)

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
`DataQueueTest.MultiTest` could intermittently hang after DataQueue
moved sink dependency notifications outside the per-sub-queue lock. Root
cause: `SubQueue` queue state and `sink_dependency` state were no longer
serialized by `queue_lock`, so a producer could observe its sink
dependency as blocked even after the queue had already become empty,
leaving no future push/pop to wake it. This patch updates
`sink_dependency->set_ready()` and `sink_dependency->block()` while
holding `queue_lock`, keeping queue occupancy and sink readiness
transitions atomic with respect to each other.

Related PR: apache#62947

(cherry picked from commit 17bbba4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants