[improvement](fe) Add external table metadata profile details#63648
[improvement](fe) Add external table metadata profile details#63648Gabriel39 wants to merge 6 commits into
Conversation
### What problem does this PR solve? Issue Number: close apache#25867 Related PR: #xxx Problem Summary: External table queries previously showed only coarse FE metadata time in profile, making it hard to locate slow metadata access steps for Hive, Iceberg, Hudi, and Paimon scans. This change records dedicated profile timings for external partition value loading, partition metadata loading, partition file listing, and file scan task planning. The scan nodes keep a SummaryProfile reference so asynchronous split planning can also report its metadata time. ### Release note Expose more detailed FE external table metadata timing in query profile. ### Check List (For Author) - Test: Unit Test - Added SummaryProfile unit coverage for the new external table metadata profile fields. - `git diff --check` and `git diff --cached --check` passed. - `./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest` was attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17. - Behavior changed: Yes. Query profile now includes more detailed external table metadata timing fields. - Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found two profile correctness issues that should be fixed before approval.
Critical checkpoint conclusions:
- Goal/test: The PR aims to expose detailed external table metadata timing, with unit coverage for SummaryProfile aggregation only. The scan-node integration has untested path coverage gaps.
- Scope/clarity: The change is focused, but two inserted timers do not match the actual invocation/lazy-execution flow.
- Concurrency: The added SummaryProfile accumulation methods are synchronized and are used from async split planning paths; no lock-order issue found, but timing must be placed around the actual async/lazy work.
- Lifecycle/static init: No special lifecycle or static initialization issue found.
- Configuration/compatibility: No new config, protocol, storage, or incompatible format change found.
- Parallel paths: Hive/Paimon paths are mostly covered, but Hudi normal partition initialization and Iceberg batch/no-manifest-cache planning paths are missed or under-counted.
- Error handling: No new silent failure path found in the reviewed changes.
- Test coverage: Current tests cover only SummaryProfile aggregation formatting, not the affected Hudi/Iceberg scan-node control flows.
- Observability/performance: The feature improves observability, but the two issues below make reported timings misleading for common external table paths.
- User focus: No additional user-provided review focus was specified.
Because these are user-visible profile accuracy issues in the feature being added, I am requesting changes.
| long startTime = System.currentTimeMillis(); | ||
| try { | ||
| prunedPartitions = hmsTable.getCatalog().getExecutionAuthenticator().execute(() | ||
| -> getPrunedPartitions(hudiClient)); |
There was a problem hiding this comment.
This timer is skipped in the normal planning flow. FileQueryScanNode.createScanRangeLocations() calls isBatchMode() before getSplits(), and Hudi's isBatchMode() initializes prunedPartitions and sets partitionInit = true for full-table scans. By the time execution reaches this if (!partitionInit) block, it is already false, so External Table Get Partitions Time and the aggregate external meta time do not include Hudi partition pruning for ordinary Hudi scans. Please time the partition initialization where isBatchMode() does it, or refactor the partition initialization into one helper that records the profile exactly once.
| return doPlanFileScanTask(scan); | ||
| } finally { | ||
| if (getSummaryProfile() != null) { | ||
| getSummaryProfile().addExternalTableGetFileScanTasksTime(System.currentTimeMillis() - startTime); |
There was a problem hiding this comment.
This records only the time to create the CloseableIterable, not necessarily the time to plan/iterate the file scan tasks. In the manifest-cache-disabled path, batch mode returns TableScanUtil.splitFiles(scan.planFiles(), ...), and the actual Iceberg planFiles() iteration happens later in doStartSplit()/doGetSplits() after this finally has already added the metric. For batch Iceberg scans without manifest cache, the new External Table Get File Scan Tasks Time can therefore be near zero while the expensive metadata planning happens outside the timer. Please measure around the iteration or return a timed iterable so the recorded value covers the actual scan-task planning work.
### What problem does this PR solve? Issue Number: close apache#25867 Related PR: #xxx Problem Summary: After adding split-level external table metadata profile details, some FE-side metadata requests were still only visible as coarse scan initialization or planning time. This change adds a table metadata profile bucket and records table/schema/timeline/fs-view initialization requests for Hive, Iceberg, Hudi, and Paimon scans. Hudi partition value loading is also recorded under the partition values metric so partition pruning metadata work is visible in profile. ### Release note Expose table metadata initialization time in external table query profiles. ### Check List (For Author) - Test: Unit Test - Updated SummaryProfile unit coverage for table metadata timing. - `git diff --check` and `git diff --cached --check` passed. - `./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest` was attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17. - Behavior changed: Yes. Query profile now includes table metadata initialization timing for external table scans. - Does this need documentation: No
Issue Number: close apache#25867 Related PR: apache#63648 Problem Summary: Review found two profile accuracy issues in external table metadata timers. Hudi full-table scans initialize pruned partitions from isBatchMode before getSplits, so timing in getSplits could be skipped. Iceberg file scan task planning can be lazy, so timing only around CloseableIterable creation missed the actual iteration work, especially in batch mode without manifest cache. This change records Hudi partition initialization through one helper used by both paths and moves Iceberg scan task timing around the actual task iteration. None - Test: Unit Test - `git diff --check` and `git diff --cached --check` passed. - `./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest` was attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17. - Behavior changed: No - Does this need documentation: No
|
/review |
|
/review |
There was a problem hiding this comment.
Summary: I found one additional profiling correctness issue in the current PR. The earlier Hudi partition-pruning and Iceberg lazy-iteration timer concerns are already present in existing review threads and are not repeated here.
Critical checkpoint conclusions:
- Goal/test: The PR aims to expose finer external table metadata timings. SummaryProfile unit coverage checks aggregation, but this Hudi empty-table path is not covered.
- Scope/clarity: The changes are generally focused on profiling.
- Concurrency: Async split paths update SummaryProfile through synchronized adders; no data correctness issue found from those updates.
- Lifecycle/static initialization: No static initialization or special lifecycle issue found.
- Configuration/compatibility: No new configs or storage/protocol compatibility changes.
- Parallel paths: Hive/Iceberg/Paimon/Hudi paths were reviewed; the remaining distinct gap is Hudi initialization with no completed instant.
- Tests/results: Existing test verifies aggregation only; it does not cover scan-node timing branches.
- Observability: The issue below causes External Table Meta Time to omit successful Hudi table metadata work for empty timelines.
User focus: No additional user-provided review focus was specified.
| initSchemaParams(); | ||
|
|
||
| long startTime = System.currentTimeMillis(); | ||
| hudiClient = hmsTable.getHudiClient(); |
There was a problem hiding this comment.
This timer is not recorded on the successful empty-table path. startTime is set here, but if the Hudi timeline has no completed instant, lines 211-215 set prunedPartitions and partitionInit and then return before the new addExternalTableGetTableMetaTime(...) call below. As a result, an empty Hudi table still does the metadata work (getHudiClient, reloadActiveTimeline, remote table fields, timeline lookup) but External Table Meta Time remains missing it. Please put the table-meta accounting in a finally or record it before this early return.
There was a problem hiding this comment.
Review result: one remaining profiling accuracy issue. The previous Hudi partition-init and Iceberg lazy-iteration timing concerns appear addressed in the current head, and I did not find additional issues for the user focus because no extra focus was provided.
Critical checkpoints: Goal: the PR adds detailed external table metadata timing; mostly achieved, but Hudi empty-timeline initialization still misses the new table-meta bucket. Scope: the changes are focused on FE profile timing. Concurrency: async split planning updates the captured SummaryProfile and the new add methods are synchronized, so no new data race was found. Lifecycle/static initialization: no static initialization hazard found. Configuration/compatibility: no new config, persisted format, or FE-BE protocol change requiring compatibility handling found. Parallel paths: Hive, Iceberg, Hudi, and Paimon are covered, with the remaining Hudi early-return gap noted inline. Tests: SummaryProfile unit coverage was added, but no end-to-end external catalog profile test is present and FE UT was not run in this environment per the PR description. Observability: this is an observability change; the remaining issue affects metric accuracy for a concrete Hudi path. Transactions/data writes: not applicable.
| initBackendPolicy(); | ||
| initSchemaParams(); | ||
|
|
||
| long startTime = System.currentTimeMillis(); |
There was a problem hiding this comment.
This table-metadata timer is still skipped for Hudi tables with no completed instant. In doInitialize(), after loading the timeline, the !snapshotInstant.isPresent() branch sets prunedPartitions and returns before reaching addExternalTableGetTableMetaTime(...). That path has already done Hudi client/timeline metadata work (getHudiClient(), reloadActiveTimeline(), getCommitsAndCompactionTimeline()), but the new External Table Get Table Meta Time and aggregate external meta time remain N/A/zero for an empty Hudi table. Please record this with a finally around the initialized section or add the metric before that early return.
### What problem does this PR solve? Issue Number: close apache#25867 Related PR: apache#63648 Problem Summary: Hudi scan initialization returned early for tables with no completed instant before the external table metadata timer was recorded. Move the table metadata profile accounting into a finally block so metadata work such as client creation, timeline reload, remote table field reads, and timeline lookup is included for the empty-table path as well. ### Release note None ### Check List (For Author) - Test: git diff --check; git diff --cached --check; attempted ./run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTest, but it failed because JAVA_HOME points to JDK 11 and JDK_17 is not set - Behavior changed: No - Does this need documentation: No
|
/review |
There was a problem hiding this comment.
Reviewed the current head for the external table metadata profiling changes across SummaryProfile, FileQueryScanNode, Hive/Hudi/Iceberg/Paimon scan paths, and the SummaryProfile unit test. I did not find additional blocking issues distinct from the existing review threads. The previously raised Hudi empty-table, Hudi isBatchMode partition initialization, and Iceberg lazy planFiles timing concerns appear addressed in the current code.
Critical checkpoint conclusions:
- Goal and proof: The PR adds detailed external table metadata timing buckets and wires them into Hive, Hudi, Iceberg, and Paimon planning/split paths. SummaryProfile has unit coverage for aggregation/display of the new fields.
- Scope and clarity: The change is focused on profile accounting and keeps existing scan behavior largely unchanged.
- Concurrency: The main concurrency-sensitive additions are async split planning updates to SummaryProfile. The accumulator methods are synchronized, and scan nodes keep a SummaryProfile reference so async threads do not depend on ThreadLocal ConnectContext.
- Lifecycle/static initialization: No new static initialization dependency or non-obvious lifecycle ownership issue found.
- Configuration/compatibility: No new config, protocol field, persistence format, or rolling-upgrade compatibility concern found.
- Parallel paths: Hive, Hudi, Iceberg, and Paimon paths are covered. I also checked batch and non-batch split paths for the changed code.
- Tests: Only SummaryProfile unit coverage is added. I did not run FE tests in this runner; author notes JDK 17 was unavailable. No additional user-provided review focus was specified.
|
run buildall |
### What problem does this PR solve? Issue Number: close apache#25867 Related PR: apache#63648 Problem Summary: Hive partition initialization caught AnalysisException even though the profiled try block no longer calls code that throws that checked exception. Remove the unreachable catch block to fix Java compilation while keeping runtime exception profiling unchanged. ### Release note None ### Check List (For Author) - Test: git diff --check; git diff --cached --check - Behavior changed: No - Does this need documentation: No
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #25867
Related PR: #xxx
Problem Summary: External table queries previously showed only coarse FE metadata time in profile, making it hard to locate slow metadata access steps for Hive, Iceberg, Hudi, and Paimon scans. This change records dedicated profile timings for external partition value loading, partition metadata loading, partition file listing, and file scan task planning. The scan nodes keep a SummaryProfile reference so asynchronous split planning can also report its metadata time.
Release note
Expose more detailed FE external table metadata timing in query profile.
Check List (For Author)
git diff --checkandgit diff --cached --checkpassed../run-fe-ut.sh --run org.apache.doris.common.profile.SummaryProfileTestwas attempted but could not run because this environment only has JDK 11 and the FE test script requires JDK 17.What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)