[refactor](fe) Refactor SummaryProfile to use QueryTrace/ProfileSpan dynamic tracing#61603
Open
morningman wants to merge 5 commits intoapache:masterfrom
Open
[refactor](fe) Refactor SummaryProfile to use QueryTrace/ProfileSpan dynamic tracing#61603morningman wants to merge 5 commits intoapache:masterfrom
morningman wants to merge 5 commits intoapache:masterfrom
Conversation
### What problem does this PR solve? Issue Number: close #xxx Problem Summary: SummaryProfile uses hardcoded timestamp fields and manual setXxxTime() calls scattered across the codebase. This makes adding new metrics error-prone, requires synchronized blocks for each accumulator, tightly couples consumers (NereidsPlanner, PartitionPruner, etc.) to SummaryProfile internals, and obscures parent-child relationships between timing phases. ### Release note None ### Check List (For Author) - Test: No need to test (pure refactoring, runtime behavior unchanged; legacy fields preserved for backward compatibility during migration) - Behavior changed: No - Does this need documentation: No --- Introduces two new classes: - **ProfileSpan** (AutoCloseable): A timed span that automatically records duration and manages the ThreadLocal span stack. Includes NO_OP sentinel for null-safe usage in UT contexts. - **QueryTrace**: Per-query tracing container with three metric types (Span, Counter, Text). Uses ThreadLocal span stack for automatic parent detection, LinkedHashMap for insertion-order display, and ReentrantReadWriteLock for thread safety. Supports both try-with-resources spans and pre-computed recordDuration() for metrics like GC time. Migrated callers: - NereidsPlanner: collectAndLockTable, analyze, rewrite, preMaterializedViewRewrite, optimize, splitFragments (translate), doDistribute, GC time - AbstractMaterializedViewRule: MV rewrite time, collect table partition time - PartitionPruner: partition prune time - FoldConstantRuleOnBE: BE fold constant time - ExternalFileTableValuedFunction: TVF init time Supporting changes: - RuntimeProfile: self-contained indentation via addInfoString(key, value, indent) - SummaryProfile: QueryTrace field + populateProfile() integration - StatementContext: getQueryTrace() convenience accessor
…dge getters ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Phase 3 of SummaryProfile refactoring: migrate external data source profiling (FileQueryScanNode, HiveScanNode, HiveTableSink) to use QueryTrace.recordDuration instead of setting legacy timestamp pairs on SummaryProfile. Also bridge existing SummaryProfile getter methods (used by MetricRepo histograms and plan debug output) to read from QueryTrace first, falling back to legacy timestamp fields. This ensures no regression in monitoring metrics while the migration progresses. ### Release note None ### Check List (For Author) - Test: No need to test (pure refactoring, backward compatible via fallback getters) - Behavior changed: No - Does this need documentation: No --- Changes: SummaryProfile: - Add getTraceDurationMs() bridge helper - Update all Nereids getXxxTimeMs() to read QueryTrace first - Update getPrettyNereidsXxx() to delegate to getXxxTimeMs() - Update getInitScanNodeTimeMs/getFinalizeScanNodeTimeMs/getCreateScanRangeTimeMs - Remove duplicate Nereids entries from updateExecutionSummaryProfile - Fix addNereidsPartitiionPruneTime (was incorrectly mutating externalTvfInitTime) FileQueryScanNode: - Replace setInitScanNodeStartTime/FinishTime with recordDuration - Replace setFinalizeScanNodeStartTime/FinishTime with recordDuration - Replace setGetSplitsStartTime/FinishTime with recordDuration - Replace setCreateScanRangeFinishTime with recordDuration - Add recordDurationToTrace helper HiveScanNode: - Replace setGetPartitionsFinishTime with QueryTrace recordDuration - Replace setGetPartitionFilesFinishTime with QueryTrace recordDuration HiveTableSink: - Replace setSinkGetPartitionsStartTime/FinishTime with QueryTrace recordDuration
### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Phase 3 of SummaryProfile refactoring: migrate the outer container timings (StmtExecutor, ConnectProcessor, AbstractInsertExecutor) and HMSTransaction filesystem/HMS profiling to use QueryTrace. This eliminates the shared tempStarTime/freshXxx pattern in HMSTransaction and records all durations directly via QueryTrace.recordDuration(). SummaryProfile.updateExecutionSummaryProfile is restructured: migrated metrics are only emitted via populateProfile() when QueryTrace is available, falling back to legacy timeline output when null. ### Release note None ### Check List (For Author) - Test: No need to test (pure refactoring, backward compatible via fallback) - Behavior changed: No - Does this need documentation: No --- Changes: StmtExecutor: - Record Plan Time, Parse SQL Time, Schedule Time, Fetch/Write Result Time durations in QueryTrace alongside legacy timestamp setters - Record Executed By Frontend text via QueryTrace.setText - Record Query Begin Time via QueryTrace.setText ConnectProcessor: - Record Parse SQL Time duration in QueryTrace after setting legacy timestamps AbstractInsertExecutor: - Record Schedule Time and Fetch Result Time in QueryTrace HMSTransaction: - Replace Optional<SummaryProfile> with Optional<QueryTrace> - Migrate waitForAsyncFileSystemTasks, doAddPartitionsTask, doUpdateStatisticsTasks, wrapperRename/Delete methods - Use QueryTrace.recordDuration and addCounter instead of tempStarTime SummaryProfile: - Restructure updateExecutionSummaryProfile: migrated metrics only emit legacy entries when queryTrace is null - Bridge getPlanTimeMs and getScheduleTimeMs through getTraceDurationMs
…ryProfile ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Phase 4 cleanup of SummaryProfile refactoring. Removes ~250 lines of dead code: all Nereids timing setters (setNereidsAnalysisTime, setNereidsRewriteTime, etc.), data source scan node setters (setInitScanNodeStartTime, setGetSplitsStartTime, etc.), HMS/filesystem setters (freshFilesystemOptTime, setHmsAddPartitionTime, addRenameFileCnt, etc.), and their backing fields. These were fully superseded by QueryTrace in Phases 1-3. Getters now read exclusively from QueryTrace with no legacy fallback for cleaned-up metrics. The HMS transaction summary block in setTransactionSummary is removed as those metrics are now emitted by QueryTrace.populateProfile(). Also migrates InitMaterializationContextHook from setNereidsCollectTablePartitionFinishTime to QueryTrace, and rewrites SummaryProfileTest to exercise the new QueryTrace API. ### Release note None ### Check List (For Author) - Test: Unit Test (SummaryProfileTest rewritten) - Behavior changed: No - Does this need documentation: No --- Removed methods: - setNereidsLockTableFinishTime, setNereidsAnalysisTime, setNereidsRewriteTime - setNereidsOptimizeTime, setNereidsTranslateTime, setNereidsDistributeTime - setNereidsGarbageCollectionTime, sumBeFoldTime - setNereidsPreRewriteByMvFinishTime, setNereidsCollectTablePartitionFinishTime - addCollectTablePartitionTime - setInitScanNodeStartTime/FinishTime, setFinalizeScanNodeStartTime/FinishTime - setGetSplitsStartTime/FinishTime, setGetPartitionsFinishTime - setGetPartitionFilesFinishTime, setSinkGetPartitionsStartTime/FinishTime - setCreateScanRangeFinishTime - freshFilesystemOptTime, setHmsAddPartitionTime, addHmsAddPartitionCnt - setHmsUpdatePartitionTime, addHmsUpdatePartitionCnt - addRenameFileCnt, incRenameDirCnt, incDeleteDirRecursiveCnt, incDeleteFileCnt - addNereidsMvRewriteTime, addExternalCatalogMetaTime, addExternalTvfInitTime - addNereidsPartitiionPruneTime Removed fields: - nereidsLockTableFinishTime, nereidsAnalysisFinishTime, nereidsRewriteFinishTime - nereidsOptimizeFinishTime, nereidsTranslateFinishTime, nereidsDistributeFinishTime - nereidsCollectTablePartitionFinishTime, nereidsCollectTablePartitionTime - nereidsPreRewriteByMvFinishTime, nereidsGarbageCollectionTime, nereidsBeFoldConstTime - initScanNodeStartTime/FinishTime, finalizeScanNodeStartTime/FinishTime - getSplitsStartTime/FinishTime, getPartitionsFinishTime, getPartitionFilesFinishTime - sinkSetPartitionValuesStartTime/FinishTime, createScanRangeFinishTime - filesystemOptTime, hmsAddPartitionTime/Cnt, hmsUpdatePartitionTime/Cnt - filesystemRenameFileCnt/RenameDirCnt/DeleteDirCnt/DeleteFileCnt - nereidsMvRewriteTime, externalCatalogMetaTime, externalTvfInitTime - nereidsPartitiionPruneTime
### What problem does this PR solve? Problem Summary: The updateExecutionSummaryProfile() method had a dead code block guarded by `if (queryTrace == null)` that still referenced deleted scan-node timestamp fields (initScanNodeFinishTime, getSplitsStartTime, etc.). Since queryTrace is always initialized as `new QueryTrace()` (never null), this block was unreachable dead code. Removing it fixes the compilation errors. ### Release note None ### Check List (For Author) - Test: No need to test (removes dead code that caused compilation failure) - Behavior changed: No - Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 26723 ms |
TPC-DS: Total hot run time: 169103 ms |
Contributor
FE UT Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
The current
SummaryProfileimplementation in Doris FE has several issues:nereidsAnalysisFinishTime - nereidsLockTableFinishTime), where incorrect ordering of setter calls leads to negative durations.NereidsPlanner,StmtExecutor,ConnectProcessor,FileQueryScanNode,HMSTransaction, etc., all directly callingSummaryProfilesetters.This PR introduces a new QueryTrace/ProfileSpan system that replaces the legacy approach:
ProfileSpanimplementsAutoCloseablefor safe, scoped timing via try-with-resources.QueryTraceprovides dynamic metric registration viastartSpan(),recordDuration(),addCounter(), andsetText().Changes
Phase 1 - Core Infrastructure + NereidsPlanner (
c1121e3)ProfileSpan,QueryTraceRuntimeProfileto support indentedaddInfoStringPhase 2 - Data Source + Getter Bridge (
00939aa)FileQueryScanNode,HiveScanNode,HiveTableSinktiminggetTraceDurationMs()bridge in SummaryProfile gettersPhase 3 - StmtExecutor + HMSTransaction (
6ea75e4)Optional<SummaryProfile>withOptional<QueryTrace>in HMSTransactionPhase 4 - Cleanup (
bc0af91,d92c6ab)updateExecutionSummaryProfile()SummaryProfileTestto use QueryTrace APIRelease note
None
Check List (For Author)