Skip to content

[Improvement](scan) Update scanner limit controller (#61617)#61962

Merged
yiguolei merged 1 commit intoapache:branch-4.1from
BiteTheDDDDt:cp_0331_2
Apr 1, 2026
Merged

[Improvement](scan) Update scanner limit controller (#61617)#61962
yiguolei merged 1 commit intoapache:branch-4.1from
BiteTheDDDDt:cp_0331_2

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

pick from #61617

This pull request introduces a shared atomic limit mechanism in the
scanner execution context to more accurately enforce SQL `LIMIT`
constraints across multiple concurrent scanners. The changes ensure that
the row limit is respected globally, preventing over-scanning and
improving resource efficiency. Key updates include the introduction of a
thread-safe quota system, modifications to scanner scheduling and task
pulling logic, and enhancements to debugging output.

**Shared Limit Enforcement and Quota Management:**

* Added a new atomic member `_remaining_limit` to `ScannerContext`,
initialized with the SQL `LIMIT` value and decremented atomically as
rows are scanned. Introduced the `acquire_limit_quota()` method for
scanners to claim their share of the remaining quota.
[[1]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R74)
[[2]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R99)
[[3]](diffhunk://#diff-3049f42cade971254aae07ced700d9a10b2505b03da743efea3270e63bd88dceR224-R229)
* Modified the scanner scheduling logic in
`ScannerScheduler::_scanner_scan` to check and acquire quota before
processing blocks, ensuring that scanners stop or truncate blocks when
the shared limit is exhausted.
[[1]](diffhunk://#diff-ecdf52f3fb33b9018cc1aff92085e470087071b25d79efed8a849a289215d05fR235-R239)
[[2]](diffhunk://#diff-ecdf52f3fb33b9018cc1aff92085e470087071b25d79efed8a849a289215d05fR276-R292)
* Updated the pending scanner task pulling logic to avoid scheduling new
scanners when the shared limit is depleted.

**Scanner Completion and Task Management:**

* Enhanced the logic for marking scanner context as finished: now,
completion is triggered either when all scanners are done or when the
shared limit is exhausted and no scanners are running.

**Debugging and Observability:**

* Improved the `debug_string()` output in `ScannerContext` to include
the current value of `remaining_limit`, aiding in troubleshooting and
monitoring.

**Performance Optimization:**

* Clarified and preserved the per-scanner small-limit optimization,
ensuring that when the limit is smaller than the batch size, scanners
return early to avoid unnecessary data processing.
@BiteTheDDDDt BiteTheDDDDt requested a review from yiguolei as a code owner March 31, 2026 12:27
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 20.37% (11/54) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.83% (19678/37246)
Line Coverage 36.23% (184203/508474)
Region Coverage 32.53% (142519/438158)
Branch Coverage 33.70% (62619/185813)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (52/52) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.07% (25907/36455)
Line Coverage 53.86% (272963/506769)
Region Coverage 51.26% (226638/442120)
Branch Coverage 52.65% (98114/186341)

@yiguolei yiguolei merged commit d90ac33 into apache:branch-4.1 Apr 1, 2026
27 of 29 checks passed
yiguolei added a commit that referenced this pull request Apr 8, 2026
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Apr 10, 2026
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants