Skip to content

[Improvement](scan) Update scanner limit controller#61617

Open
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:dev_0323
Open

[Improvement](scan) Update scanner limit controller#61617
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:dev_0323

Conversation

@BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Mar 23, 2026

Summary

当前每个 Scanner 独立持有 _limit(等于 SQL LIMIT 值)并独立计数。N 个并发 Scanner(默认约 4 个)执行 SELECT * FROM t WHERE x=1 LIMIT 10 时,总共会读出 4×10=40 行,其中 30 行被上层丢弃,浪费了 IO 和 CPU。

本 PR 在 ScannerContext 层面引入一个共享的原子计数器 _remaining_limit,所有并发 Scanner 通过 CAS 操作竞争消费配额,使得所有 Scanner 合计产出的行数不显著超过 LIMIT 值。
改动说明
共享配额机制(scanner_context.h / scanner_context.cpp)

新增 std::atomic<int64_t> _remaining_limit 字段,紧邻已有的 limit 字段,初始化为 SQL LIMIT 值(-1 表示无 LIMIT)。
新增 acquire_limit_quota(desired) 方法:CAS 循环实现,Scanner 申请 desired 行,返回实际批准的行数(0 表示配额耗尽)。无 LIMIT 时直接放行。
新增 remaining_limit() 访问器。

调度层防 hang(scanner_context.cpp)

_pull_next_scan_task():当 _remaining_limit == 0 时,不再从 pending 队列提交新 Scanner,避免无用调度。
get_block_from_queue():扩展完成判定条件。原来仅当"所有 Scanner 都完成"时标记 finished,现在增加"配额耗尽 + 无正在运行的 Scanner + 队列为空"也标记 finished。两个条件合并在一个 if 块中。这防止了因 pending Scanner 永远不被调度而导致 pipeline 死锁。

Scanner 执行时的配额检查(scanner_scheduler.cpp)

循环顶部:与 ctx->done() 等守卫条件并列,当 remaining_limit() == 0 时提前 break 并设 eos,避免已无配额的 Scanner 继续做 IO。
每个 block 产出后:调用 acquire_limit_quota(block_rows) 申请配额:
    granted == 0 → 丢弃 block,设 eos
    granted < block_rows → block->set_num_rows(granted) 截断,设 eos
    granted == block_rows → 正常继续

其他

debug_string() 输出中增加 remaining_limit 信息,方便排查。

This pull request introduces a shared, atomic row limit mechanism in the scanner context to ensure that concurrent scanners collectively respect the SQL LIMIT clause. The main changes implement a thread-safe, centrally managed quota for remaining rows, preventing over-scanning and efficiently coordinating concurrent scanner threads. Additionally, related logic is updated to stop or throttle scanners when the quota is exhausted and to provide improved debug information.

Shared limit management and enforcement:

  • Added a new atomic member _remaining_limit to ScannerContext, representing the shared remaining row limit across all scanners, and initialized it appropriately. Provided an acquire_limit_quota() method for atomically claiming rows from this quota. [1] [2] [3]
  • Updated scanner scheduling logic to check and respect the shared limit before launching new scan tasks or continuing scanning, ensuring no new work is scheduled if the limit is exhausted. [1] [2]

Block quota enforcement and block truncation:

  • In the scanner execution loop, after reading a block, scanners now atomically acquire quota for the number of rows in the block. If quota is exhausted, the block is discarded or truncated to the permitted row count, and scanning stops.

Completion and lifecycle management:

  • Modified context completion logic to also mark the scan as finished if the shared limit is exhausted and no scanners are running, ensuring correct query termination.

Debugging and observability:

  • Enhanced the debug_string() output of ScannerContext to include the current value of remaining_limit, aiding in diagnostics and monitoring.

Small-limit optimization:

  • Retained and clarified the optimization for scanners with a small per-scanner limit, ensuring they return early to avoid unnecessary data scanning.

Copilot AI review requested due to automatic review settings March 23, 2026 06:50
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves BE scan execution by introducing a shared, atomic “remaining rows” quota in ScannerContext so that multiple concurrent scanners collectively respect a query-level SQL LIMIT, preventing over-returning rows when scanners run in parallel.

Changes:

  • Added _remaining_limit and an atomic acquire_limit_quota() mechanism in ScannerContext to centrally manage remaining rows across scanners.
  • Enforced the shared limit in the scanner execution loop by truncating/discarding blocks once quota is exhausted and stopping further scanning.
  • Updated scheduling/completion and debug_string() to reflect shared-limit exhaustion and improve observability.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
be/src/exec/scan/scanner_scheduler.cpp Stops scanning when shared quota is exhausted; truncates/discards blocks based on atomically acquired quota.
be/src/exec/scan/scanner_context.h Adds atomic shared limit state and quota-acquisition API used by the scheduler.
be/src/exec/scan/scanner_context.cpp Implements atomic quota acquisition; updates finished-condition and debug output; prevents launching new scanners when quota is 0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zclllyybb
Copy link
Contributor

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR introduces a shared atomic _remaining_limit counter in ScannerContext so that concurrent scanners collectively respect the SQL LIMIT clause, avoiding the current behavior where N scanners each independently produce up to LIMIT rows (N×LIMIT total).

Critical Checkpoint Conclusions

1. Goal and correctness: The goal is to reduce wasted I/O and CPU when concurrent scanners over-produce rows for LIMIT queries. The approach is sound: a lock-free CAS loop (acquire_limit_quota) atomically grants row quotas across all scanner threads. The total rows granted can never exceed the initial limit. The operator-level reached_limit() remains as a redundant safety net.

2. Modification scope: Changes are focused and minimal (3 files, 68 additions). The scope is appropriate.

3. Concurrency analysis:

  • _remaining_limit is an std::atomic<int64_t> using acquire/release semantics for lifecycle-critical reads/writes. The CAS loop in acquire_limit_quota is correct.
  • _num_scheduled_scanners is always read and written under _transfer_lock, making the completion check at get_block_from_queue line 362 safe. There is no race where a scanner has consumed quota but hasn't delivered its results (because _num_scheduled_scanners > 0 while any scanner is in-flight).
  • Memory ordering is appropriate: acquire for reads, acq_rel for CAS, relaxed only for the diagnostic debug_string() read.

4. Lifecycle / completion logic: The new finish condition (_remaining_limit == 0 && _num_scheduled_scanners == 0) correctly handles the case where pending scanners are never scheduled because the limit is exhausted. The existing DCHECK at schedule_scan_task lines 548-554 cannot fire because _get_margin() always returns >= 1 when _tasks_queue and _num_scheduled_scanners are both 0.

5. No incompatible changes: No new configuration items, no serialization/format changes, no FE-BE protocol changes.

6. Parallel code paths: The remaining_limit() check at the top of the scan loop (line 236) and the acquire_limit_quota() after reading a block (line 281) are complementary — the former is an optimization to avoid unnecessary I/O, the latter is the binding enforcement.

7. Test coverage: No tests are included in this PR. This is a behavioral optimization that affects concurrent scanner coordination under LIMIT. A regression test (e.g., verifying that a multi-scanner LIMIT query does not read significantly more rows than the limit) would strengthen confidence. At minimum, the existing regression test suite should be run to confirm no regressions.

8. Observability: debug_string() is updated to include remaining_limit, which is good for diagnostics.

9. Performance: The CAS loop is O(1) amortized and lock-free. The early-exit check (remaining_limit() == 0) before I/O is a good optimization. No performance concerns.

Issues Found

  1. [Medium] limit == 0 edge case — potential hang if FE ever passes limit_=0: The constructor handles limit < 0 (normalizing to -1) but has no handling for limit == 0. If limit_ == 0 were passed, _remaining_limit would start at 0, _pull_next_scan_task would refuse to schedule any scanners, the dependency would remain blocked, and the query would hang. While the FE's EliminateLimit rule currently eliminates LIMIT 0 before it reaches BE, the BE code should be self-consistent. See inline comment.

  2. [Low] Missing regression test: The PR has no test coverage. A targeted regression test for the shared limit behavior would be valuable.

Verdict

The core mechanism is correct and well-designed. The CAS loop, the completion logic, and the concurrency safety are all sound. The limit == 0 edge case is a defensive hardening concern (not a practical bug today), and the missing tests are worth addressing.

};
if (limit < 0) {
limit = -1;
_remaining_limit.store(-1, std::memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] Missing limit == 0 handling. The constructor handles limit < 0 (normalizing to -1) but does not handle limit == 0. If limit_=0 were ever passed:

  • _remaining_limit starts at 0
  • _pull_next_scan_task refuses to schedule any scanner (remaining == 0 check)
  • No scanner runs, so push_back_scan_task is never called, _dependency->set_ready() is never invoked
  • The scan operator is permanently blocked → query hangs

While the FE's EliminateLimit rule currently prevents LIMIT 0 from reaching the scan layer, the BE code should be self-consistent. Consider adding a guard in init() similar to the existing empty-scanners check:

if (limit == 0) {
    _is_finished = true;
    _set_scanner_done();
    return Status::OK();
}

Or at minimum handle it in the constructor alongside the limit < 0 case.

// -1 means no limit. Scanners call acquire_limit_quota() to claim rows.
std::atomic<int64_t> _remaining_limit;
// Atomically acquire up to `desired` rows. Returns actual granted count (0 = exhausted).
int64_t acquire_limit_quota(int64_t desired);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Access specifier placement. acquire_limit_quota() and remaining_limit() are declared in the protected section but called from ScannerScheduler::_scanner_scan() (a friend class) and the scanner scan loop. Since ScannerScheduler is already a friend, this works, but consider whether these should be public for clarity — especially remaining_limit() which is a simple accessor with no side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants