[Bug](profile) fix wrong count of scan wait worker timer (#61064) by BiteTheDDDDt · Pull Request #61691 · apache/doris

BiteTheDDDDt · 2026-03-25T02:50:22Z

pick from #61064

This pull request improves the accuracy and clarity of scanner timing and profiling in the `ScannerScheduler` implementation. The main changes clarify the order and logic for updating and recording timing counters, ensuring that CPU and wait times are measured more precisely during scanner execution and cleanup. **Enhancements to scanner timing and profiling:** * Added detailed comments explaining the correct order for setting and getting timing counters (`update_wait_worker_timer`, `start_scan_cpu_timer`, `update_scan_cpu_timer`, `start_wait_worker_timer`) to ensure accurate measurement of scanner CPU and wait times. * Refactored the `Defer` cleanup logic to only start the wait worker timer if the scanner has not failed and has not reached the end-of-stream, preventing redundant or incorrect counter updates.

Thearas · 2026-03-25T02:50:27Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

BiteTheDDDDt · 2026-03-25T02:50:38Z

run buildall

Copilot

Pull request overview

This PR adjusts scanner timing/profiling behavior in the BE vectorized scan path, aiming to make scanner CPU-time and worker-wait-time accounting more accurate in ScannerScheduler and to avoid crashes when realtime counters are updated before an OlapScanner is fully prepared.

Changes:

Adds/updates timing-related comments and refactors counter/timer update order in ScannerScheduler::_scanner_scan.
Changes deferred cleanup logic to conditionally start the wait-worker timer.
Makes OlapScanner::update_realtime_counters() a no-op when _has_prepared is false to avoid null dereferences.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
be/src/vec/exec/scan/scanner_scheduler.cpp	Refactors scanner timing/profiling update flow and deferred cleanup behavior.
be/src/vec/exec/scan/olap_scanner.cpp	Guards realtime counter updates when prepare hasn’t completed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T02:54:10Z

be/src/vec/exec/scan/olap_scanner.cpp

+        // Counter update need prepare successfully, or it maybe core. For example, olap scanner
+        // will open tablet reader during prepare, if not prepare successfully, tablet reader == nullptr.


Typo/wording: "it maybe core" is unclear (likely meant "it may core dump" / "it may crash"). Consider rephrasing to a clearer description of the failure mode (e.g., counters depend on tablet reader being initialized).

Suggested change

// Counter update need prepare successfully, or it maybe core. For example, olap scanner

// will open tablet reader during prepare, if not prepare successfully, tablet reader == nullptr.

// Counter update requires successful prepare; otherwise it may crash. For example, OlapScanner

// opens the tablet reader during prepare, so if prepare fails, tablet reader == nullptr.

Copilot · 2026-03-25T02:54:10Z

be/src/vec/exec/scan/scanner_scheduler.cpp

    bool eos = false;
+    Defer defer_scanner([&] {
+        if (status.ok() && !eos) {
+            // if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again.


update_scanner_profile() (which calls update_scan_cpu_timer() and update_realtime_counters()) is no longer executed for the non-eos path. Since start_scan_cpu_timer() resets _cpu_watch at the beginning of each _scanner_scan invocation, skipping update_scan_cpu_timer() at the end of a slice will drop CPU time for all partial scans and also stop realtime counters from being refreshed for workload-group policies. Consider calling update_scanner_profile() in the Defer cleanup (before start_wait_worker_timer()), and keep the existing eos fast-path if needed.

Suggested change

// if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again.

// if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again.

update_scanner_profile();

Copilot · 2026-03-25T02:54:10Z

be/src/vec/exec/scan/scanner_scheduler.cpp

+    // we set and get counter according below order, to make sure the counter is updated before get_block, and the time of get_block is recorded in the counter.
+    // 1. update_wait_worker_timer to make sure the time of waiting for worker thread is recorded in the timer
+    // 2. start_scan_cpu_timer to make sure the cpu timer include the time of open and get_block, which is the real cpu time of scanner
+    // 3. update_scan_cpu_timer when defer, to make sure the cpu timer include the time of open and get_block, which is the real cpu time of scanner
+    // 4. start_wait_worker_timer when defer, to make sure the time of waiting for worker thread is recorded in the timer
+


The new comment block claims "update_scan_cpu_timer when defer" / "start_wait_worker_timer when defer", but the current Defer only starts the wait-worker timer and does not update the CPU timer (and only does so conditionally). Please align the comments with the actual control flow, or reintroduce the deferred counter update so the documentation matches behavior.

Suggested change

// we set and get counter according below order, to make sure the counter is updated before get_block, and the time of get_block is recorded in the counter.

// 1. update_wait_worker_timer to make sure the time of waiting for worker thread is recorded in the timer

// 2. start_scan_cpu_timer to make sure the cpu timer include the time of open and get_block, which is the real cpu time of scanner

// 3. update_scan_cpu_timer when defer, to make sure the cpu timer include the time of open and get_block, which is the real cpu time of scanner

// 4. start_wait_worker_timer when defer, to make sure the time of waiting for worker thread is recorded in the timer

// We set and get counters according to the order below, to make sure the counters are updated

// before get_block, and the time of get_block is recorded correctly:

// 1. update_wait_worker_timer to make sure the time of waiting for worker thread is recorded in the timer.

// 2. start_scan_cpu_timer to make sure the CPU timer includes the time of open and get_block, which is the real CPU time of the scanner.

// 3. update_scan_cpu_timer via update_scanner_profile before leaving the scanner, so the CPU timer reflects the work done.

// 4. start_wait_worker_timer in the deferred lambda (when status.ok() && !eos) to record the time of waiting for worker thread after this run.

Copilot · 2026-03-25T02:54:11Z

be/src/vec/exec/scan/scanner_scheduler.cpp

+    bool need_update_profile = true;
    auto update_scanner_profile = [&]() {
        if (need_update_profile) {
            scanner->update_scan_cpu_timer();
            scanner->update_realtime_counters();
            need_update_profile = false;
        }


Changing need_update_profile from scanner->has_prepared() to unconditional true means update_scanner_profile() may call update_realtime_counters() even when prepare() fails and scanner internals are uninitialized (the old comment referenced a potential crash). If the intent is to always update CPU time, consider keeping a guard for update_realtime_counters() (e.g., if (scanner->has_prepared())) and/or making all update_realtime_counters() implementations defensively handle the unprepared state.

hello-stephen · 2026-03-25T05:50:20Z

BE Regression && UT Coverage Report

Increment line coverage 77.78% (7/9) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.31% (25327/35519)
Line Coverage	54.00% (267127/494649)
Region Coverage	51.58% (220776/428007)
Branch Coverage	53.06% (95215/179441)

yiguolei · 2026-03-25T06:22:58Z

skip buildall

BiteTheDDDDt requested a review from yiguolei as a code owner March 25, 2026 02:50

Copilot AI review requested due to automatic review settings March 25, 2026 02:50

Copilot started reviewing on behalf of BiteTheDDDDt March 25, 2026 02:51 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

yiguolei approved these changes Mar 25, 2026

View reviewed changes

yiguolei merged commit 474cdf2 into apache:branch-4.0 Mar 25, 2026
35 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug](profile) fix wrong count of scan wait worker timer (#61064)#61691

[Bug](profile) fix wrong count of scan wait worker timer (#61064)#61691
yiguolei merged 1 commit intoapache:branch-4.0from
BiteTheDDDDt:pick_0325

BiteTheDDDDt commented Mar 25, 2026 •

edited

Loading

Uh oh!

Thearas commented Mar 25, 2026

Uh oh!

BiteTheDDDDt commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

hello-stephen commented Mar 25, 2026

Uh oh!

yiguolei commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		// Counter update need prepare successfully, or it maybe core. For example, olap scanner
		// will open tablet reader during prepare, if not prepare successfully, tablet reader == nullptr.

	// if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again.
	// if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again.
	update_scanner_profile();

-    // we set and get counter according below order, to make sure the counter is updated before get_block, and the time of get_block is recorded in the counter.
-    // 1. update_wait_worker_timer to make sure the time of waiting for worker thread is recorded in the timer
-    // 2. start_scan_cpu_timer to make sure the cpu timer include the time of open and get_block, which is the real cpu time of scanner
-    // 3. update_scan_cpu_timer when defer, to make sure the cpu timer include the time of open and get_block, which is the real cpu time of scanner
-    // 4. start_wait_worker_timer when defer, to make sure the time of waiting for worker thread is recorded in the timer
+    // We set and get counters according to the order below, to make sure the counters are updated
+    // before get_block, and the time of get_block is recorded correctly:
+    // 1. update_wait_worker_timer to make sure the time of waiting for worker thread is recorded in the timer.
+    // 2. start_scan_cpu_timer to make sure the CPU timer includes the time of open and get_block, which is the real CPU time of the scanner.
+    // 3. update_scan_cpu_timer via update_scanner_profile before leaving the scanner, so the CPU timer reflects the work done.
+    // 4. start_wait_worker_timer in the deferred lambda (when status.ok() && !eos) to record the time of waiting for worker thread after this run.

Conversation

BiteTheDDDDt commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thearas commented Mar 25, 2026

Uh oh!

BiteTheDDDDt commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

hello-stephen commented Mar 25, 2026

BE Regression && UT Coverage Report

Uh oh!

yiguolei commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BiteTheDDDDt commented Mar 25, 2026 •

edited

Loading