Skip to content

[improvement](fe) Add external table metadata profile details#63648

Open
Gabriel39 wants to merge 6 commits into
apache:masterfrom
Gabriel39:dev_0526
Open

[improvement](fe) Add external table metadata profile details#63648
Gabriel39 wants to merge 6 commits into
apache:masterfrom
Gabriel39:dev_0526

Conversation

@Gabriel39
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #25867

Related PR: #xxx

Problem Summary: External table queries previously showed only coarse FE metadata time in profile, making it hard to locate slow metadata access steps for Hive, Iceberg, Hudi, and Paimon scans. This change records dedicated profile timings for external partition value loading, partition metadata loading, partition file listing, and file scan task planning. The scan nodes keep a SummaryProfile reference so asynchronous split planning can also report its metadata time.

Release note

Expose more detailed FE external table metadata timing in query profile.

Check List (For Author)

  • Test: Unit Test
    • Added SummaryProfile unit coverage for the new external table metadata profile fields.
    • git diff --check and git diff --cached --check passed.
    • ./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest was attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17.
  • Behavior changed: Yes. Query profile now includes more detailed external table metadata timing fields.
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: close apache#25867

Related PR: #xxx

Problem Summary: External table queries previously showed only coarse FE metadata time in profile, making it hard to locate slow metadata access steps for Hive, Iceberg, Hudi, and Paimon scans. This change records dedicated profile timings for external partition value loading, partition metadata loading, partition file listing, and file scan task planning. The scan nodes keep a SummaryProfile reference so asynchronous split planning can also report its metadata time.

### Release note

Expose more detailed FE external table metadata timing in query profile.

### Check List (For Author)

- Test: Unit Test
    - Added SummaryProfile unit coverage for the new external table metadata profile fields.
    - `git diff --check` and `git diff --cached --check` passed.
    - `./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest` was attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17.
- Behavior changed: Yes. Query profile now includes more detailed external table metadata timing fields.
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two profile correctness issues that should be fixed before approval.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to expose detailed external table metadata timing, with unit coverage for SummaryProfile aggregation only. The scan-node integration has untested path coverage gaps.
  • Scope/clarity: The change is focused, but two inserted timers do not match the actual invocation/lazy-execution flow.
  • Concurrency: The added SummaryProfile accumulation methods are synchronized and are used from async split planning paths; no lock-order issue found, but timing must be placed around the actual async/lazy work.
  • Lifecycle/static init: No special lifecycle or static initialization issue found.
  • Configuration/compatibility: No new config, protocol, storage, or incompatible format change found.
  • Parallel paths: Hive/Paimon paths are mostly covered, but Hudi normal partition initialization and Iceberg batch/no-manifest-cache planning paths are missed or under-counted.
  • Error handling: No new silent failure path found in the reviewed changes.
  • Test coverage: Current tests cover only SummaryProfile aggregation formatting, not the affected Hudi/Iceberg scan-node control flows.
  • Observability/performance: The feature improves observability, but the two issues below make reported timings misleading for common external table paths.
  • User focus: No additional user-provided review focus was specified.

Because these are user-visible profile accuracy issues in the feature being added, I am requesting changes.

long startTime = System.currentTimeMillis();
try {
prunedPartitions = hmsTable.getCatalog().getExecutionAuthenticator().execute(()
-> getPrunedPartitions(hudiClient));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timer is skipped in the normal planning flow. FileQueryScanNode.createScanRangeLocations() calls isBatchMode() before getSplits(), and Hudi's isBatchMode() initializes prunedPartitions and sets partitionInit = true for full-table scans. By the time execution reaches this if (!partitionInit) block, it is already false, so External Table Get Partitions Time and the aggregate external meta time do not include Hudi partition pruning for ordinary Hudi scans. Please time the partition initialization where isBatchMode() does it, or refactor the partition initialization into one helper that records the profile exactly once.

return doPlanFileScanTask(scan);
} finally {
if (getSummaryProfile() != null) {
getSummaryProfile().addExternalTableGetFileScanTasksTime(System.currentTimeMillis() - startTime);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This records only the time to create the CloseableIterable, not necessarily the time to plan/iterate the file scan tasks. In the manifest-cache-disabled path, batch mode returns TableScanUtil.splitFiles(scan.planFiles(), ...), and the actual Iceberg planFiles() iteration happens later in doStartSplit()/doGetSplits() after this finally has already added the metric. For batch Iceberg scans without manifest cache, the new External Table Get File Scan Tasks Time can therefore be near zero while the expensive metadata planning happens outside the timer. Please measure around the iteration or return a timed iterable so the recorded value covers the actual scan-task planning work.

Gabriel39 added 2 commits May 26, 2026 10:38
### What problem does this PR solve?

Issue Number: close apache#25867

Related PR: #xxx

Problem Summary: After adding split-level external table metadata profile details, some FE-side metadata requests were still only visible as coarse scan initialization or planning time. This change adds a table metadata profile bucket and records table/schema/timeline/fs-view initialization requests for Hive, Iceberg, Hudi, and Paimon scans. Hudi partition value loading is also recorded under the partition values metric so partition pruning metadata work is visible in profile.

### Release note

Expose table metadata initialization time in external table query profiles.

### Check List (For Author)

- Test: Unit Test
    - Updated SummaryProfile unit coverage for table metadata timing.
    - `git diff --check` and `git diff --cached --check` passed.
    - `./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest` was attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17.
- Behavior changed: Yes. Query profile now includes table metadata initialization timing for external table scans.
- Does this need documentation: No
Issue Number: close apache#25867

Related PR: apache#63648

Problem Summary: Review found two profile accuracy issues in external table metadata timers. Hudi full-table scans initialize pruned partitions from isBatchMode before getSplits, so timing in getSplits could be skipped. Iceberg file scan task planning can be lazy, so timing only around CloseableIterable creation missed the actual iteration work, especially in batch mode without manifest cache. This change records Hudi partition initialization through one helper used by both paths and moves Iceberg scan task timing around the actual task iteration.

None

- Test: Unit Test
    - `git diff --check` and `git diff --cached --check` passed.
    - `./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest` was attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17.
- Behavior changed: No
- Does this need documentation: No
@Gabriel39
Copy link
Copy Markdown
Contributor Author

/review

@Gabriel39
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: I found one additional profiling correctness issue in the current PR. The earlier Hudi partition-pruning and Iceberg lazy-iteration timer concerns are already present in existing review threads and are not repeated here.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to expose finer external table metadata timings. SummaryProfile unit coverage checks aggregation, but this Hudi empty-table path is not covered.
  • Scope/clarity: The changes are generally focused on profiling.
  • Concurrency: Async split paths update SummaryProfile through synchronized adders; no data correctness issue found from those updates.
  • Lifecycle/static initialization: No static initialization or special lifecycle issue found.
  • Configuration/compatibility: No new configs or storage/protocol compatibility changes.
  • Parallel paths: Hive/Iceberg/Paimon/Hudi paths were reviewed; the remaining distinct gap is Hudi initialization with no completed instant.
  • Tests/results: Existing test verifies aggregation only; it does not cover scan-node timing branches.
  • Observability: The issue below causes External Table Meta Time to omit successful Hudi table metadata work for empty timelines.

User focus: No additional user-provided review focus was specified.

initSchemaParams();

long startTime = System.currentTimeMillis();
hudiClient = hmsTable.getHudiClient();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timer is not recorded on the successful empty-table path. startTime is set here, but if the Hudi timeline has no completed instant, lines 211-215 set prunedPartitions and partitionInit and then return before the new addExternalTableGetTableMetaTime(...) call below. As a result, an empty Hudi table still does the metadata work (getHudiClient, reloadActiveTimeline, remote table fields, timeline lookup) but External Table Meta Time remains missing it. Please put the table-meta accounting in a finally or record it before this early return.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: one remaining profiling accuracy issue. The previous Hudi partition-init and Iceberg lazy-iteration timing concerns appear addressed in the current head, and I did not find additional issues for the user focus because no extra focus was provided.

Critical checkpoints: Goal: the PR adds detailed external table metadata timing; mostly achieved, but Hudi empty-timeline initialization still misses the new table-meta bucket. Scope: the changes are focused on FE profile timing. Concurrency: async split planning updates the captured SummaryProfile and the new add methods are synchronized, so no new data race was found. Lifecycle/static initialization: no static initialization hazard found. Configuration/compatibility: no new config, persisted format, or FE-BE protocol change requiring compatibility handling found. Parallel paths: Hive, Iceberg, Hudi, and Paimon are covered, with the remaining Hudi early-return gap noted inline. Tests: SummaryProfile unit coverage was added, but no end-to-end external catalog profile test is present and FE UT was not run in this environment per the PR description. Observability: this is an observability change; the remaining issue affects metric accuracy for a concrete Hudi path. Transactions/data writes: not applicable.

initBackendPolicy();
initSchemaParams();

long startTime = System.currentTimeMillis();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table-metadata timer is still skipped for Hudi tables with no completed instant. In doInitialize(), after loading the timeline, the !snapshotInstant.isPresent() branch sets prunedPartitions and returns before reaching addExternalTableGetTableMetaTime(...). That path has already done Hudi client/timeline metadata work (getHudiClient(), reloadActiveTimeline(), getCommitsAndCompactionTimeline()), but the new External Table Get Table Meta Time and aggregate external meta time remain N/A/zero for an empty Hudi table. Please record this with a finally around the initialized section or add the metric before that early return.

### What problem does this PR solve?

Issue Number: close apache#25867

Related PR: apache#63648

Problem Summary: Hudi scan initialization returned early for tables with no completed instant before the external table metadata timer was recorded. Move the table metadata profile accounting into a finally block so metadata work such as client creation, timeline reload, remote table field reads, and timeline lookup is included for the empty-table path as well.

### Release note

None

### Check List (For Author)

- Test: git diff --check; git diff --cached --check; attempted ./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest, but it failed because JAVA_HOME points to JDK 11 and JDK_17 is not set
- Behavior changed: No
- Does this need documentation: No
@Gabriel39
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the current head for the external table metadata profiling changes across SummaryProfile, FileQueryScanNode, Hive/Hudi/Iceberg/Paimon scan paths, and the SummaryProfile unit test. I did not find additional blocking issues distinct from the existing review threads. The previously raised Hudi empty-table, Hudi isBatchMode partition initialization, and Iceberg lazy planFiles timing concerns appear addressed in the current code.

Critical checkpoint conclusions:

  • Goal and proof: The PR adds detailed external table metadata timing buckets and wires them into Hive, Hudi, Iceberg, and Paimon planning/split paths. SummaryProfile has unit coverage for aggregation/display of the new fields.
  • Scope and clarity: The change is focused on profile accounting and keeps existing scan behavior largely unchanged.
  • Concurrency: The main concurrency-sensitive additions are async split planning updates to SummaryProfile. The accumulator methods are synchronized, and scan nodes keep a SummaryProfile reference so async threads do not depend on ThreadLocal ConnectContext.
  • Lifecycle/static initialization: No new static initialization dependency or non-obvious lifecycle ownership issue found.
  • Configuration/compatibility: No new config, protocol field, persistence format, or rolling-upgrade compatibility concern found.
  • Parallel paths: Hive, Hudi, Iceberg, and Paimon paths are covered. I also checked batch and non-batch split paths for the changed code.
  • Tests: Only SummaryProfile unit coverage is added. I did not run FE tests in this runner; author notes JDK 17 was unavailable. No additional user-provided review focus was specified.

@Gabriel39
Copy link
Copy Markdown
Contributor Author

run buildall

### What problem does this PR solve?

Issue Number: close apache#25867

Related PR: apache#63648

Problem Summary: Hive partition initialization caught AnalysisException even though the profiled try block no longer calls code that throws that checked exception. Remove the unreachable catch block to fix Java compilation while keeping runtime exception profiling unchanged.

### Release note

None

### Check List (For Author)

- Test: git diff --check; git diff --cached --check
- Behavior changed: No
- Does this need documentation: No
@Gabriel39
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 9.13% (24/263) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants