[Improvement](scan) Update scanner limit controller#61617
[Improvement](scan) Update scanner limit controller#61617BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR improves BE scan execution by introducing a shared, atomic “remaining rows” quota in ScannerContext so that multiple concurrent scanners collectively respect a query-level SQL LIMIT, preventing over-returning rows when scanners run in parallel.
Changes:
- Added
_remaining_limitand an atomicacquire_limit_quota()mechanism inScannerContextto centrally manage remaining rows across scanners. - Enforced the shared limit in the scanner execution loop by truncating/discarding blocks once quota is exhausted and stopping further scanning.
- Updated scheduling/completion and
debug_string()to reflect shared-limit exhaustion and improve observability.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| be/src/exec/scan/scanner_scheduler.cpp | Stops scanning when shared quota is exhausted; truncates/discards blocks based on atomically acquired quota. |
| be/src/exec/scan/scanner_context.h | Adds atomic shared limit state and quota-acquisition API used by the scheduler. |
| be/src/exec/scan/scanner_context.cpp | Implements atomic quota acquisition; updates finished-condition and debug output; prevents launching new scanners when quota is 0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/review |
There was a problem hiding this comment.
Code Review Summary
This PR introduces a shared atomic _remaining_limit counter in ScannerContext so that concurrent scanners collectively respect the SQL LIMIT clause, avoiding the current behavior where N scanners each independently produce up to LIMIT rows (N×LIMIT total).
Critical Checkpoint Conclusions
1. Goal and correctness: The goal is to reduce wasted I/O and CPU when concurrent scanners over-produce rows for LIMIT queries. The approach is sound: a lock-free CAS loop (acquire_limit_quota) atomically grants row quotas across all scanner threads. The total rows granted can never exceed the initial limit. The operator-level reached_limit() remains as a redundant safety net.
2. Modification scope: Changes are focused and minimal (3 files, 68 additions). The scope is appropriate.
3. Concurrency analysis:
_remaining_limitis anstd::atomic<int64_t>usingacquire/releasesemantics for lifecycle-critical reads/writes. The CAS loop inacquire_limit_quotais correct._num_scheduled_scannersis always read and written under_transfer_lock, making the completion check atget_block_from_queueline 362 safe. There is no race where a scanner has consumed quota but hasn't delivered its results (because_num_scheduled_scanners > 0while any scanner is in-flight).- Memory ordering is appropriate:
acquirefor reads,acq_relfor CAS,relaxedonly for the diagnosticdebug_string()read.
4. Lifecycle / completion logic: The new finish condition (_remaining_limit == 0 && _num_scheduled_scanners == 0) correctly handles the case where pending scanners are never scheduled because the limit is exhausted. The existing DCHECK at schedule_scan_task lines 548-554 cannot fire because _get_margin() always returns >= 1 when _tasks_queue and _num_scheduled_scanners are both 0.
5. No incompatible changes: No new configuration items, no serialization/format changes, no FE-BE protocol changes.
6. Parallel code paths: The remaining_limit() check at the top of the scan loop (line 236) and the acquire_limit_quota() after reading a block (line 281) are complementary — the former is an optimization to avoid unnecessary I/O, the latter is the binding enforcement.
7. Test coverage: No tests are included in this PR. This is a behavioral optimization that affects concurrent scanner coordination under LIMIT. A regression test (e.g., verifying that a multi-scanner LIMIT query does not read significantly more rows than the limit) would strengthen confidence. At minimum, the existing regression test suite should be run to confirm no regressions.
8. Observability: debug_string() is updated to include remaining_limit, which is good for diagnostics.
9. Performance: The CAS loop is O(1) amortized and lock-free. The early-exit check (remaining_limit() == 0) before I/O is a good optimization. No performance concerns.
Issues Found
-
[Medium]
limit == 0edge case — potential hang if FE ever passeslimit_=0: The constructor handleslimit < 0(normalizing to -1) but has no handling forlimit == 0. Iflimit_ == 0were passed,_remaining_limitwould start at 0,_pull_next_scan_taskwould refuse to schedule any scanners, the dependency would remain blocked, and the query would hang. While the FE'sEliminateLimitrule currently eliminatesLIMIT 0before it reaches BE, the BE code should be self-consistent. See inline comment. -
[Low] Missing regression test: The PR has no test coverage. A targeted regression test for the shared limit behavior would be valuable.
Verdict
The core mechanism is correct and well-designed. The CAS loop, the completion logic, and the concurrency safety are all sound. The limit == 0 edge case is a defensive hardening concern (not a practical bug today), and the missing tests are worth addressing.
| }; | ||
| if (limit < 0) { | ||
| limit = -1; | ||
| _remaining_limit.store(-1, std::memory_order_relaxed); |
There was a problem hiding this comment.
[Medium] Missing limit == 0 handling. The constructor handles limit < 0 (normalizing to -1) but does not handle limit == 0. If limit_=0 were ever passed:
_remaining_limitstarts at 0_pull_next_scan_taskrefuses to schedule any scanner (remaining == 0 check)- No scanner runs, so
push_back_scan_taskis never called,_dependency->set_ready()is never invoked - The scan operator is permanently blocked → query hangs
While the FE's EliminateLimit rule currently prevents LIMIT 0 from reaching the scan layer, the BE code should be self-consistent. Consider adding a guard in init() similar to the existing empty-scanners check:
if (limit == 0) {
_is_finished = true;
_set_scanner_done();
return Status::OK();
}Or at minimum handle it in the constructor alongside the limit < 0 case.
| // -1 means no limit. Scanners call acquire_limit_quota() to claim rows. | ||
| std::atomic<int64_t> _remaining_limit; | ||
| // Atomically acquire up to `desired` rows. Returns actual granted count (0 = exhausted). | ||
| int64_t acquire_limit_quota(int64_t desired); |
There was a problem hiding this comment.
[Nit] Access specifier placement. acquire_limit_quota() and remaining_limit() are declared in the protected section but called from ScannerScheduler::_scanner_scan() (a friend class) and the scanner scan loop. Since ScannerScheduler is already a friend, this works, but consider whether these should be public for clarity — especially remaining_limit() which is a simple accessor with no side effects.
Summary
当前每个 Scanner 独立持有 _limit(等于 SQL LIMIT 值)并独立计数。N 个并发 Scanner(默认约 4 个)执行 SELECT * FROM t WHERE x=1 LIMIT 10 时,总共会读出 4×10=40 行,其中 30 行被上层丢弃,浪费了 IO 和 CPU。
本 PR 在 ScannerContext 层面引入一个共享的原子计数器 _remaining_limit,所有并发 Scanner 通过 CAS 操作竞争消费配额,使得所有 Scanner 合计产出的行数不显著超过 LIMIT 值。
改动说明
共享配额机制(scanner_context.h / scanner_context.cpp)
调度层防 hang(scanner_context.cpp)
Scanner 执行时的配额检查(scanner_scheduler.cpp)
其他
This pull request introduces a shared, atomic row limit mechanism in the scanner context to ensure that concurrent scanners collectively respect the SQL
LIMITclause. The main changes implement a thread-safe, centrally managed quota for remaining rows, preventing over-scanning and efficiently coordinating concurrent scanner threads. Additionally, related logic is updated to stop or throttle scanners when the quota is exhausted and to provide improved debug information.Shared limit management and enforcement:
_remaining_limittoScannerContext, representing the shared remaining row limit across all scanners, and initialized it appropriately. Provided anacquire_limit_quota()method for atomically claiming rows from this quota. [1] [2] [3]Block quota enforcement and block truncation:
Completion and lifecycle management:
Debugging and observability:
debug_string()output ofScannerContextto include the current value ofremaining_limit, aiding in diagnostics and monitoring.Small-limit optimization: