Skip to content

branch-4.1:[improvement](executor) unify current query runtime statistics and expose task progress (#60567)#63130

Merged
yiguolei merged 1 commit into
apache:branch-4.1from
HYDCP:branch-4.1-query-progress
May 11, 2026
Merged

branch-4.1:[improvement](executor) unify current query runtime statistics and expose task progress (#60567)#63130
yiguolei merged 1 commit into
apache:branch-4.1from
HYDCP:branch-4.1-query-progress

Conversation

@wenzhenghu
Copy link
Copy Markdown
Contributor

@wenzhenghu wenzhenghu commented May 11, 2026

checkpick from #60567
PR Summary

  • This PR unifies current-query runtime statistics onto the BE -> FE reporting pipeline, replacing the previous ad-hoc RuntimeProfile traversal path, and enriches current_queries with task-level progress plus broader resource metrics.
  • The goal is to make current-query visibility more real-time and consistent with audit statistics while simplifying and consolidating FE proc/REST surfaces.

What It Solves

  • Unifies statistics source: QeProcessorImpl now reads aggregated TQueryStatistics from WorkloadRuntimeStatusMgr instead of relying on the legacy CurrentQueryInfoProvider path.
  • Improves progress observability: introduces process_rows, total_tasks_num, and finished_tasks_num, and exposes computed Progress.
  • Expands runtime metrics coverage: current_queries now includes richer scan/cpu/memory/shuffle/spill/cache counters.
  • Consolidates query views: /current_queries and /current_query_stmts now share the same statistics view; legacy per-query/per-fragment proc drill-down implementation is removed.

Implementation Details

  • Protocol layer:
  • Extends TQueryStatistics with process_rows, finished_tasks_num, and total_tasks_num.
  • BE collection/reporting:
    • Accumulates process_rows in the execution path.
  • Records total_tasks_num at pipeline task graph initialization and increments finished_tasks_num in real time when tasks close.
  • Mirrors task-progress counters into QueryTaskController so counters remain available even after QueryContext teardown.
    • Exports new fields in ResourceContext::to_thrift_query_statistics.
  • FE aggregation/retention:
  • WorkloadRuntimeStatusMgr merges additional fields (including task progress) and refines timeout cleanup: remove query stats only when they are timed out and the query no longer exists in FE.
  • QueryStatisticsItem now carries TQueryStatistics as the unified data carrier for proc/REST.
  • Presentation layer:
  • CurrentQueryStatisticsProcDir adds expanded columns and computes Progress.
  • /rest/v2/manager/query/current_queries in QueryProfileAction now serves the same unified stats view.
  • Removes legacy classes: CurrentQueryInfoProvider, CurrentQuerySqlProcDir, CurrentQueryFragmentProcNode, and CurrentQueryStatementsProcNode.
*************************** 1. row ***************************
                       QueryId: e00b00b1155d4042-98862b60016a768a
                  ConnectionId: 394
                       Catalog: internal
                      Database: wzhtest
                          User: root
                      ExecTime: 20717
                       SqlHash: cf263b08302d8be436c97dd5e6f0d283
                     Statement: INSERT INTO test_query_progress_tb   SELECT DISTINCT k, CONCAT(v, CAST(k AS STRING))   FROM test_query_progress_tb   WHERE k % 2 = 0
                      ScanRows: 45400000 Rows
                     ScanBytes: 2.70 GB
                   ProcessRows: 75598123 Rows
                         CpuMs: 178336
            MaxPeakMemoryBytes: 13.03 GB
        CurrentUsedMemoryBytes: 8.69 GB
               WorkloadGroupId: 1777125330381
              ShuffleSendBytes: 0.00
               ShuffleSendRows: 0 Rows
     ScanBytesFromLocalStorage: 31.48 MB
    ScanBytesFromRemoteStorage: 0.00
 SpillWriteBytesToLocalStorage: 0.00
SpillReadBytesFromLocalStorage: 0.00
           BytesWriteIntoCache: 0.00
                    TotalTasks: 74
                 FinishedTasks: 51
                      Progress: 68%
------------------------
-- first--
QueryId: e2b8c99658a94743-9ebbf0d036d83295
  ConnectionId: 9
  Catalog: hive_test
  Database: tpch100_parquet
  User: root
  ExecTime: 6093
  SqlHash: f8a30e4182d72cce3eff6cb385005b1f
  Statement: select ... from supplier, lineitem l1, orders, nation ... limit 100
  ScanRows: 621466194 Rows
  ScanBytes: 5.37 GB
  ProcessRows: 79079742 Rows
  CpuMs: 31655
  MaxPeakMemoryBytes: 2.32 GB
  CurrentUsedMemoryBytes: 2.18 GB
  WorkloadGroupId: 1777253545394
  ShuffleSendBytes: 0.00
  ShuffleSendRows: 0 Rows
  ScanBytesFromLocalStorage: 0.00
  ScanBytesFromRemoteStorage: 5.37 GB
  SpillWriteBytesToLocalStorage: 0.00
  SpillReadBytesFromLocalStorage: 0.00
  BytesWriteIntoCache: 0.00
  TotalTasks: 138
  FinishedTasks: 49
  Progress: 35%
--second--
  QueryId: e2b8c99658a94743-9ebbf0d036d83295
  ConnectionId: 9
  Catalog: hive_test
  Database: tpch100_parquet
  User: root
  ExecTime: 10807
  SqlHash: f8a30e4182d72cce3eff6cb385005b1f
  Statement: select ... from supplier, lineitem l1, orders, nation ... limit 100
  ScanRows: 1102562592 Rows
  ScanBytes: 9.20 GB
  ProcessRows: 112176670 Rows
  CpuMs: 53808
  MaxPeakMemoryBytes: 3.13 GB
  CurrentUsedMemoryBytes: 2.50 GB
  WorkloadGroupId: 1777253545394
  ShuffleSendBytes: 0.00
  ShuffleSendRows: 0 Rows
  ScanBytesFromLocalStorage: 0.00
  ScanBytesFromRemoteStorage: 9.20 GB
  SpillWriteBytesToLocalStorage: 0.00
  SpillReadBytesFromLocalStorage: 0.00
  BytesWriteIntoCache: 0.00
  TotalTasks: 138
  FinishedTasks: 65
  Progress: 47%

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…pose task progress (apache#60567)

**PR Summary**
- This PR unifies current-query runtime statistics onto the `BE -> FE`
reporting pipeline, replacing the previous ad-hoc `RuntimeProfile`
traversal path, and enriches `current_queries` with task-level progress
plus broader resource metrics.
- The goal is to make current-query visibility more real-time and
consistent with audit statistics while simplifying and consolidating FE
proc/REST surfaces.

**What It Solves**
- Unifies statistics source: `QeProcessorImpl` now reads aggregated
`TQueryStatistics` from `WorkloadRuntimeStatusMgr` instead of relying on
the legacy `CurrentQueryInfoProvider` path.
- Improves progress observability: introduces `process_rows`,
`total_tasks_num`, and `finished_tasks_num`, and exposes computed
`Progress`.
- Expands runtime metrics coverage: `current_queries` now includes
richer scan/cpu/memory/shuffle/spill/cache counters.
- Consolidates query views: `/current_queries` and
`/current_query_stmts` now share the same statistics view; legacy
per-query/per-fragment proc drill-down implementation is removed.

**Implementation Details**
- Protocol layer:
- Extends `TQueryStatistics` with `process_rows`, `finished_tasks_num`,
and `total_tasks_num`.
- BE collection/reporting:
  - Accumulates `process_rows` in the execution path.
- Records `total_tasks_num` at pipeline task graph initialization and
increments `finished_tasks_num` in real time when tasks close.
- Mirrors task-progress counters into `QueryTaskController` so counters
remain available even after `QueryContext` teardown.
  - Exports new fields in `ResourceContext::to_thrift_query_statistics`.
- FE aggregation/retention:
- `WorkloadRuntimeStatusMgr` merges additional fields (including task
progress) and refines timeout cleanup: remove query stats only when they
are timed out and the query no longer exists in FE.
- `QueryStatisticsItem` now carries `TQueryStatistics` as the unified
data carrier for proc/REST.
- Presentation layer:
- `CurrentQueryStatisticsProcDir` adds expanded columns and computes
`Progress`.
- `/rest/v2/manager/query/current_queries` in `QueryProfileAction` now
serves the same unified stats view.
- Removes legacy classes: `CurrentQueryInfoProvider`,
`CurrentQuerySqlProcDir`, `CurrentQueryFragmentProcNode`, and
`CurrentQueryStatementsProcNode`.

```
*************************** 1. row ***************************
                       QueryId: e00b00b1155d4042-98862b60016a768a
                  ConnectionId: 394
                       Catalog: internal
                      Database: wzhtest
                          User: root
                      ExecTime: 20717
                       SqlHash: cf263b08302d8be436c97dd5e6f0d283
                     Statement: INSERT INTO test_query_progress_tb   SELECT DISTINCT k, CONCAT(v, CAST(k AS STRING))   FROM test_query_progress_tb   WHERE k % 2 = 0
                      ScanRows: 45400000 Rows
                     ScanBytes: 2.70 GB
                   ProcessRows: 75598123 Rows
                         CpuMs: 178336
            MaxPeakMemoryBytes: 13.03 GB
        CurrentUsedMemoryBytes: 8.69 GB
               WorkloadGroupId: 1777125330381
              ShuffleSendBytes: 0.00
               ShuffleSendRows: 0 Rows
     ScanBytesFromLocalStorage: 31.48 MB
    ScanBytesFromRemoteStorage: 0.00
 SpillWriteBytesToLocalStorage: 0.00
SpillReadBytesFromLocalStorage: 0.00
           BytesWriteIntoCache: 0.00
                    TotalTasks: 74
                 FinishedTasks: 51
                      Progress: 68%
------------------------
-- first--
QueryId: e2b8c99658a94743-9ebbf0d036d83295
  ConnectionId: 9
  Catalog: hive_test
  Database: tpch100_parquet
  User: root
  ExecTime: 6093
  SqlHash: f8a30e4182d72cce3eff6cb385005b1f
  Statement: select ... from supplier, lineitem l1, orders, nation ... limit 100
  ScanRows: 621466194 Rows
  ScanBytes: 5.37 GB
  ProcessRows: 79079742 Rows
  CpuMs: 31655
  MaxPeakMemoryBytes: 2.32 GB
  CurrentUsedMemoryBytes: 2.18 GB
  WorkloadGroupId: 1777253545394
  ShuffleSendBytes: 0.00
  ShuffleSendRows: 0 Rows
  ScanBytesFromLocalStorage: 0.00
  ScanBytesFromRemoteStorage: 5.37 GB
  SpillWriteBytesToLocalStorage: 0.00
  SpillReadBytesFromLocalStorage: 0.00
  BytesWriteIntoCache: 0.00
  TotalTasks: 138
  FinishedTasks: 49
  Progress: 35%
--second--
  QueryId: e2b8c99658a94743-9ebbf0d036d83295
  ConnectionId: 9
  Catalog: hive_test
  Database: tpch100_parquet
  User: root
  ExecTime: 10807
  SqlHash: f8a30e4182d72cce3eff6cb385005b1f
  Statement: select ... from supplier, lineitem l1, orders, nation ... limit 100
  ScanRows: 1102562592 Rows
  ScanBytes: 9.20 GB
  ProcessRows: 112176670 Rows
  CpuMs: 53808
  MaxPeakMemoryBytes: 3.13 GB
  CurrentUsedMemoryBytes: 2.50 GB
  WorkloadGroupId: 1777253545394
  ShuffleSendBytes: 0.00
  ShuffleSendRows: 0 Rows
  ScanBytesFromLocalStorage: 0.00
  ScanBytesFromRemoteStorage: 9.20 GB
  SpillWriteBytesToLocalStorage: 0.00
  SpillReadBytesFromLocalStorage: 0.00
  BytesWriteIntoCache: 0.00
  TotalTasks: 138
  FinishedTasks: 65
  Progress: 47%
```

None

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [x] Unit Test
    - [x] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [x] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [x] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

---------

Co-authored-by: yiguolei <guolei@selectdb.com>
Co-authored-by: xuchenhao <419062425@qq.com>
Co-authored-by: xuchenhao <48084123+xuchenhao@users.noreply.github.com>
@wenzhenghu wenzhenghu requested a review from yiguolei as a code owner May 11, 2026 07:37
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.11% (1848/2366)
Line Coverage 64.84% (33221/51232)
Region Coverage 65.34% (16443/25167)
Branch Coverage 55.89% (8781/15710)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 42.53% (37/87) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 93.94% (31/33) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.30% (20090/37691)
Line Coverage 36.80% (189411/514721)
Region Coverage 33.13% (147311/444682)
Branch Coverage 34.20% (64434/188411)

@yiguolei
Copy link
Copy Markdown
Contributor

skip buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (33/33) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.50% (26371/36885)
Line Coverage 54.50% (279518/512869)
Region Coverage 51.78% (232278/448582)
Branch Coverage 53.15% (100414/188925)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 47.79% (65/136) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 2a5147b into apache:branch-4.1 May 11, 2026
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants