[fix](pipeline) avoid data queue sink dependency lost wakeup#63055
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Request changes.
The PR changes only DataQueue locking/notification behavior, and it regresses the previous lock-order separation around Dependency notifications.
Critical checkpoint conclusions:
- Goal/test: the change appears intended to simplify queue operations, but no targeted concurrency test demonstrates that dependency notifications remain safe.
- Scope: small and focused, but it changes concurrency semantics in a hot pipeline path.
- Concurrency: applicable. Source/sink pipeline tasks share SubQueue::queue_lock and Dependency state. Calling Dependency::set_ready()/block() while holding queue_lock creates nested lock ordering and should be kept outside queue_lock.
- Lifecycle/static initialization: not applicable.
- Configuration/compatibility/protocol/persistence/data writes: not applicable.
- Parallel paths: clear_blocks() and set_source_block() still deliberately notify outside their queue/source locks; try_pop()/try_push() should follow the same pattern.
- Error handling/memory/observability/performance: no additional issues found beyond the lock-order/contention regression.
- Tests: no new test coverage for this concurrency behavior was found.
- User focus: no additional user-provided review focus.
|
/review |
There was a problem hiding this comment.
Reviewed PR 63055 against c5883bb4ae11c7b477052d6332cc9bb3da76a40d..9b9fa843898f6387160bf4392a45cc9e5b294da5.
No new distinct blocking issue found beyond the already-open inline thread on SubQueue::try_pop()/try_push() dependency notification while holding queue_lock. I did not duplicate that existing thread.
Critical checkpoint conclusions:
- Goal and tests: The PR aims to avoid a lost wakeup in
DataQueuesink dependency state by serializing queue occupancy and sink readiness transitions. The modified path is covered by existingDataQueueTest.MultiTest, but the PR body does not show a completed test run. - Scope: The change is small and focused on
be/src/exec/operator/data_queue.cpp. - Concurrency: The modified code is concurrency-sensitive between sink producer pipeline tasks and source consumer pipeline tasks. Per-subqueue state remains guarded by
queue_lock; source dependency transitions remain guarded by_source_lock. The existing review thread already covers the main concern about invokingDependency::{set_ready,block}underqueue_lock. - Lifecycle/static initialization: No new static/global lifecycle issue found.
- Configuration: No configuration item added.
- Compatibility/storage/transactions: No serialization, storage-format, FE-BE protocol, or transaction persistence change found.
- Parallel code paths: The affected
DataQueuepaths are used by union/cache shared-state operators; no separate parallel implementation requiring the same local edit was found. - Special condition checks: No new special defensive condition identified.
- Test result files: No regression
.outfiles changed. - Observability: No new observability needed for this narrowly scoped scheduling fix.
- Performance: No new clear performance issue found aside from the lock/notification concern already being discussed.
User focus: .opencode-review.6swglQ/review_focus.txt says there is no additional user-provided review focus.
|
run nonConcurrent |
…63055) ### What problem does this PR solve? Issue Number: N/A Problem Summary: `DataQueueTest.MultiTest` could intermittently hang after DataQueue moved sink dependency notifications outside the per-sub-queue lock. Root cause: `SubQueue` queue state and `sink_dependency` state were no longer serialized by `queue_lock`, so a producer could observe its sink dependency as blocked even after the queue had already become empty, leaving no future push/pop to wake it. This patch updates `sink_dependency->set_ready()` and `sink_dependency->block()` while holding `queue_lock`, keeping queue occupancy and sink readiness transitions atomic with respect to each other. Related PR: apache#62947 (cherry picked from commit 17bbba4)
What problem does this PR solve?
Issue Number: N/A
Problem Summary:
DataQueueTest.MultiTestcould intermittently hang after DataQueue moved sink dependency notifications outside the per-sub-queue lock. Root cause:SubQueuequeue state andsink_dependencystate were no longer serialized byqueue_lock, so a producer could observe its sink dependency as blocked even after the queue had already become empty, leaving no future push/pop to wake it. This patch updatessink_dependency->set_ready()andsink_dependency->block()while holdingqueue_lock, keeping queue occupancy and sink readiness transitions atomic with respect to each other.Related PR: #62947
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)