Skip to content

branch-4.0: [enhance](iceberg) Doris Iceberg Scan Metrics Integration #59010#59095

Merged
yiguolei merged 1 commit intobranch-4.0from
auto-pick-59010-branch-4.0
Dec 18, 2025
Merged

branch-4.0: [enhance](iceberg) Doris Iceberg Scan Metrics Integration #59010#59095
yiguolei merged 1 commit intobranch-4.0from
auto-pick-59010-branch-4.0

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #59010

### What problem does this PR solve?

## Overview

This change pipes Apache Iceberg scan metrics directly into Doris query
profiles so operators can inspect per-scan statistics (files, bytes,
manifests, filters, etc.) from the FE profile UI. The integration
consists of three pieces:

1. **Summary Profile Slot** – `SummaryProfile` now includes an `Iceberg
Scan Metrics` entry in the execution summary list so the FE profile
table reserves space for the Iceberg telemetry.
2. **Metrics Reporter** – a new `IcebergMetricsReporter` implementation
formats `ScanReport` data (planning time, file counters, size counters,
delete-file stats, projected columns, metadata) and appends it to the
summary entry whenever an Iceberg scan runs.
3. **Scan Integration** – `IcebergScanNode` calls
`table.newScan().metricsReporter(new IcebergMetricsReporter())`,
ensuring every Iceberg table scan emits the metrics without requiring
catalog-level configuration changes.

All metrics remain scoped to Iceberg scans; other table formats are
untouched and still populate their own runtime-profile sections as
before.

## Implementation Details

### 1. `SummaryProfile`


`fe/fe-core/src/main/java/org/apache/doris/common/profile/SummaryProfile.java`
- Added the constant `ICEBERG_SCAN_METRICS = "Iceberg Scan Metrics"`.
- Inserted the key into `EXECUTION_SUMMARY_KEYS` (with indentation level
3) so the runtime profile tree displays it under the scheduling block
when present.
- No default text is shown unless metrics are actually reported; the
entry stays `N/A` otherwise.

### 2. `IcebergMetricsReporter`


`fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/profile/IcebergMetricsReporter.java`
- Implements `org.apache.iceberg.metrics.MetricsReporter`.
- On each `ScanReport`, it retrieves the current `SummaryProfile` from
`ConnectContext`, grabs the execution summary `RuntimeProfile`, and
appends a human-readable string for that scan.
- Metrics covered:
  - Table, snapshot ID, sanitized filter text, projected column list.
  - Planning time (pretty-printed duration plus operation count).
  - Result/skipped data and delete file counts.
  - Total file size / total delete file size (in readable units).
  - Manifest counts (scanned/skipped for data and delete manifests).
  - Indexed/equality/positional delete file counters.
- Selected metadata keys (currently `scan-state`, `scan-id` if present).
- When multiple Iceberg scans run in one query, each scan’s line is
appended on a new line under the same summary key.

### 3. `IcebergScanNode`


`fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java`
- Replaces `icebergTable.newScan()` with
`icebergTable.newScan().metricsReporter(new IcebergMetricsReporter())`
inside `createTableScan()`.
- This keeps catalog properties untouched and leverages Iceberg’s
per-scan API to attach reporters.

## Example Profile Output

After running a query against `iceberg_docker.test_db.ts_identity`, the
FE profile shows:

```
  Iceberg Scan Metrics:
    Table Scan (iceberg_docker.test_db.ts_identity):
       - table: iceberg_docker.test_db.ts_identity
       - snapshot: 6315378011972705169
       - filter: true
       - columns: id|ts
       - planning: 7ms (1 ops)
       - data_files: 3
       - delete_files: 0
       - skipped_data_files: 0
       - skipped_delete_files: 0
       - total_size: 1.892 KB
       - total_delete_size: 0.000 
       - scanned_manifests: 1
       - skipped_manifests: 0
       - scanned_delete_manifests: 0
       - skipped_delete_manifests: 0
       - indexed_delete_files: 0
       - equality_delete_files: 0
       - positional_delete_files: 0
```
@github-actions github-actions bot requested a review from yiguolei as a code owner December 16, 2025 14:24
@yiguolei
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 1.28% (1/78) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 18, 2025
@github-actions
Copy link
Contributor Author

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor Author

PR approved by anyone and no changes requested.

@yiguolei yiguolei closed this Dec 18, 2025
@yiguolei yiguolei reopened this Dec 18, 2025
@yiguolei yiguolei merged commit 7b13194 into branch-4.0 Dec 18, 2025
24 of 27 checks passed
@github-actions github-actions bot deleted the auto-pick-59010-branch-4.0 branch December 18, 2025 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants