[fix](fe) Return unknown stats for system tables#62913
Merged
morrySnow merged 1 commit intoapache:masterfrom May 6, 2026
Merged
[fix](fe) Return unknown stats for system tables#62913morrySnow merged 1 commit intoapache:masterfrom
morrySnow merged 1 commit intoapache:masterfrom
Conversation
Contributor
Author
|
run buildall |
Contributor
Author
|
/review |
morrySnow
reviewed
Apr 29, 2026
Comment on lines
+274
to
+280
| if (scan instanceof OlapScan) { | ||
| OlapScan olapScan = (OlapScan) scan; | ||
| // Keep this consistent with computeOlapScan: system tables use UNKNOWN stats. | ||
| if (StatisticConstants.isSystemTable(olapScan.getTable())) { | ||
| continue; | ||
| } | ||
| } |
Contributor
There was a problem hiding this comment.
revert this files change.
Contributor
Author
There was a problem hiding this comment.
Done. I reverted the StatsCalculator.java change and removed the related StatsCalculatorTest coverage. The PR now keeps only the StatisticsCache.OlapTableStatistics accessor guard, so system-table column and partition-column stats return UNKNOWN there while normal user-table validation is unchanged.
### What problem does this PR solve? Issue Number: N/A Related PR: introduced by apache#41790 Problem Summary: Planning internal statistics SQL that scans system tables such as __internal_schema.column_statistics could recursively request column statistics for the statistics table itself. OlapTableStatistics now returns UNKNOWN for system-table column and partition-column stats, preventing the stats cache accessor from loading system-table stats recursively while leaving normal user-table validation unchanged. ### Release note None ### Check List (For Author) - Test: Unit Test - CUSTOM_MVN=/home/yujun/.sdkman/candidates/maven/current/bin/mvn bash ./run-fe-ut.sh --run org.apache.doris.statistics.StatisticsCacheTest - Behavior changed: No - Does this need documentation: No
5c8a894 to
bd632d3
Compare
Contributor
Author
|
run buildall |
1 similar comment
Contributor
|
run buildall |
morrySnow
approved these changes
May 6, 2026
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
github-actions Bot
pushed a commit
that referenced
this pull request
May 6, 2026
### What problem does this PR solve? Related PR: introduced by #41790 Problem Summary: We met a case where manually dropping column stats tablet files caused many internal queries against `__internal_schema.column_statistics`, including queries whose target stats IDs belong to the statistics table itself. The visible symptom is a self-amplifying internal-query storm such as: ```sql SELECT * FROM __internal_schema.column_statistics WHERE id IN (... column_statistics own column stats ids ...) ``` The problematic call chain is: ```text ColumnStatisticsCacheLoader.doLoad -> StatisticsRepository.loadColStats -> StatisticsUtil.execStatisticQuery -> StmtExecutor.executeInternalQuery -> NereidsPlanner.optimize -> InitJoinOrder -> StatsCalculator.disableJoinReorderIfStatsInvalid -> StatsCalculator.checkNdvValidation -> StatisticsCache.OlapTableStatistics.getColumnStatistics ``` When the internal stats-loading SQL scans `__internal_schema.column_statistics`, join-reorder's pre-validation path runs before `computeOlapScan()` derives UNKNOWN stats for system tables. That pre-validation can request column statistics for `column_statistics` itself. If the stats load fails, the async stats cache does not retain a useful value for the failed key, so repeated planning can reload the same system-table stats and amplify the internal query volume. This is separate from the audit `State=OK` display issue: those audit rows can look successful even when the internal stats query failed. The bug fixed here is the recursive system-table stats lookup during planning. This PR returns UNKNOWN from `StatisticsCache.OlapTableStatistics` column and partition-column accessors for system tables. That keeps the system-table behavior consistent with `computeOlapScan()` and prevents future callers through this accessor from accidentally loading system-table stats before the normal scan-statistics guard. The fix intentionally does not skip all internal queries. Internal jobs that scan normal user tables, such as import, MV refresh, or internal insert tasks, can still use normal stats validation and optimization.
github-actions Bot
pushed a commit
that referenced
this pull request
May 6, 2026
### What problem does this PR solve? Related PR: introduced by #41790 Problem Summary: We met a case where manually dropping column stats tablet files caused many internal queries against `__internal_schema.column_statistics`, including queries whose target stats IDs belong to the statistics table itself. The visible symptom is a self-amplifying internal-query storm such as: ```sql SELECT * FROM __internal_schema.column_statistics WHERE id IN (... column_statistics own column stats ids ...) ``` The problematic call chain is: ```text ColumnStatisticsCacheLoader.doLoad -> StatisticsRepository.loadColStats -> StatisticsUtil.execStatisticQuery -> StmtExecutor.executeInternalQuery -> NereidsPlanner.optimize -> InitJoinOrder -> StatsCalculator.disableJoinReorderIfStatsInvalid -> StatsCalculator.checkNdvValidation -> StatisticsCache.OlapTableStatistics.getColumnStatistics ``` When the internal stats-loading SQL scans `__internal_schema.column_statistics`, join-reorder's pre-validation path runs before `computeOlapScan()` derives UNKNOWN stats for system tables. That pre-validation can request column statistics for `column_statistics` itself. If the stats load fails, the async stats cache does not retain a useful value for the failed key, so repeated planning can reload the same system-table stats and amplify the internal query volume. This is separate from the audit `State=OK` display issue: those audit rows can look successful even when the internal stats query failed. The bug fixed here is the recursive system-table stats lookup during planning. This PR returns UNKNOWN from `StatisticsCache.OlapTableStatistics` column and partition-column accessors for system tables. That keeps the system-table behavior consistent with `computeOlapScan()` and prevents future callers through this accessor from accidentally loading system-table stats before the normal scan-statistics guard. The fix intentionally does not skip all internal queries. Internal jobs that scan normal user tables, such as import, MV refresh, or internal insert tasks, can still use normal stats validation and optimization.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: N/A
Related PR: introduced by #41790
Problem Summary:
We met a case where manually dropping column stats tablet files caused many internal queries against
__internal_schema.column_statistics, including queries whose target stats IDs belong to the statistics table itself. The visible symptom is a self-amplifying internal-query storm such as:The problematic call chain is:
When the internal stats-loading SQL scans
__internal_schema.column_statistics, join-reorder's pre-validation path runs beforecomputeOlapScan()derives UNKNOWN stats for system tables. That pre-validation can request column statistics forcolumn_statisticsitself. If the stats load fails, the async stats cache does not retain a useful value for the failed key, so repeated planning can reload the same system-table stats and amplify the internal query volume.This is separate from the audit
State=OKdisplay issue: those audit rows can look successful even when the internal stats query failed. The bug fixed here is the recursive system-table stats lookup during planning.This PR returns UNKNOWN from
StatisticsCache.OlapTableStatisticscolumn and partition-column accessors for system tables. That keeps the system-table behavior consistent withcomputeOlapScan()and prevents future callers through this accessor from accidentally loading system-table stats before the normal scan-statistics guard.The fix intentionally does not skip all internal queries. Internal jobs that scan normal user tables, such as import, MV refresh, or internal insert tasks, can still use normal stats validation and optimization.
Release note
None
Check List (For Author)
CUSTOM_MVN=/home/yujun/.sdkman/candidates/maven/current/bin/mvn bash ./run-fe-ut.sh --run org.apache.doris.statistics.StatisticsCacheTest