Skip to content

[fix](fe) Return unknown stats for system tables#62913

Merged
morrySnow merged 1 commit intoapache:masterfrom
yujun777:repro/cir-20019-stats-self-loop
May 6, 2026
Merged

[fix](fe) Return unknown stats for system tables#62913
morrySnow merged 1 commit intoapache:masterfrom
yujun777:repro/cir-20019-stats-self-loop

Conversation

@yujun777
Copy link
Copy Markdown
Contributor

@yujun777 yujun777 commented Apr 28, 2026

What problem does this PR solve?

Issue Number: N/A

Related PR: introduced by #41790

Problem Summary:

We met a case where manually dropping column stats tablet files caused many internal queries against __internal_schema.column_statistics, including queries whose target stats IDs belong to the statistics table itself. The visible symptom is a self-amplifying internal-query storm such as:

SELECT * FROM __internal_schema.column_statistics
WHERE id IN (... column_statistics own column stats ids ...)

The problematic call chain is:

ColumnStatisticsCacheLoader.doLoad
  -> StatisticsRepository.loadColStats
  -> StatisticsUtil.execStatisticQuery
  -> StmtExecutor.executeInternalQuery
  -> NereidsPlanner.optimize
  -> InitJoinOrder
  -> StatsCalculator.disableJoinReorderIfStatsInvalid
  -> StatsCalculator.checkNdvValidation
  -> StatisticsCache.OlapTableStatistics.getColumnStatistics

When the internal stats-loading SQL scans __internal_schema.column_statistics, join-reorder's pre-validation path runs before computeOlapScan() derives UNKNOWN stats for system tables. That pre-validation can request column statistics for column_statistics itself. If the stats load fails, the async stats cache does not retain a useful value for the failed key, so repeated planning can reload the same system-table stats and amplify the internal query volume.

This is separate from the audit State=OK display issue: those audit rows can look successful even when the internal stats query failed. The bug fixed here is the recursive system-table stats lookup during planning.

This PR returns UNKNOWN from StatisticsCache.OlapTableStatistics column and partition-column accessors for system tables. That keeps the system-table behavior consistent with computeOlapScan() and prevents future callers through this accessor from accidentally loading system-table stats before the normal scan-statistics guard.

The fix intentionally does not skip all internal queries. Internal jobs that scan normal user tables, such as import, MV refresh, or internal insert tasks, can still use normal stats validation and optimization.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • CUSTOM_MVN=/home/yujun/.sdkman/candidates/maven/current/bin/mvn bash ./run-fe-ut.sh --run org.apache.doris.statistics.StatisticsCacheTest
  • Behavior changed: No
  • Does this need documentation: No

@yujun777
Copy link
Copy Markdown
Contributor Author

run buildall

@yujun777
Copy link
Copy Markdown
Contributor Author

/review

Comment on lines +274 to +280
if (scan instanceof OlapScan) {
OlapScan olapScan = (OlapScan) scan;
// Keep this consistent with computeOlapScan: system tables use UNKNOWN stats.
if (StatisticConstants.isSystemTable(olapScan.getTable())) {
continue;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this files change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I reverted the StatsCalculator.java change and removed the related StatsCalculatorTest coverage. The PR now keeps only the StatisticsCache.OlapTableStatistics accessor guard, so system-table column and partition-column stats return UNKNOWN there while normal user-table validation is unchanged.

### What problem does this PR solve?

Issue Number: N/A

Related PR: introduced by apache#41790

Problem Summary: Planning internal statistics SQL that scans system tables such as __internal_schema.column_statistics could recursively request column statistics for the statistics table itself. OlapTableStatistics now returns UNKNOWN for system-table column and partition-column stats, preventing the stats cache accessor from loading system-table stats recursively while leaving normal user-table validation unchanged.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - CUSTOM_MVN=/home/yujun/.sdkman/candidates/maven/current/bin/mvn bash ./run-fe-ut.sh --run org.apache.doris.statistics.StatisticsCacheTest
- Behavior changed: No
- Does this need documentation: No
@yujun777 yujun777 force-pushed the repro/cir-20019-stats-self-loop branch from 5c8a894 to bd632d3 Compare May 5, 2026 06:44
@yujun777 yujun777 changed the title [fix](fe) Skip system table stats validation [fix](fe) Return unknown stats for system tables May 5, 2026
@yujun777
Copy link
Copy Markdown
Contributor Author

yujun777 commented May 5, 2026

run buildall

1 similar comment
@morrySnow
Copy link
Copy Markdown
Contributor

run buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 138da30 into apache:master May 6, 2026
31 of 32 checks passed
github-actions Bot pushed a commit that referenced this pull request May 6, 2026
### What problem does this PR solve?

Related PR: introduced by #41790

Problem Summary:

We met a case where manually dropping column stats tablet files caused
many internal queries against `__internal_schema.column_statistics`,
including queries whose target stats IDs belong to the statistics table
itself. The visible symptom is a self-amplifying internal-query storm
such as:

```sql
SELECT * FROM __internal_schema.column_statistics
WHERE id IN (... column_statistics own column stats ids ...)
```

The problematic call chain is:

```text
ColumnStatisticsCacheLoader.doLoad
  -> StatisticsRepository.loadColStats
  -> StatisticsUtil.execStatisticQuery
  -> StmtExecutor.executeInternalQuery
  -> NereidsPlanner.optimize
  -> InitJoinOrder
  -> StatsCalculator.disableJoinReorderIfStatsInvalid
  -> StatsCalculator.checkNdvValidation
  -> StatisticsCache.OlapTableStatistics.getColumnStatistics
```

When the internal stats-loading SQL scans
`__internal_schema.column_statistics`, join-reorder's pre-validation
path runs before `computeOlapScan()` derives UNKNOWN stats for system
tables. That pre-validation can request column statistics for
`column_statistics` itself. If the stats load fails, the async stats
cache does not retain a useful value for the failed key, so repeated
planning can reload the same system-table stats and amplify the internal
query volume.

This is separate from the audit `State=OK` display issue: those audit
rows can look successful even when the internal stats query failed. The
bug fixed here is the recursive system-table stats lookup during
planning.

This PR returns UNKNOWN from `StatisticsCache.OlapTableStatistics`
column and partition-column accessors for system tables. That keeps the
system-table behavior consistent with `computeOlapScan()` and prevents
future callers through this accessor from accidentally loading
system-table stats before the normal scan-statistics guard.

The fix intentionally does not skip all internal queries. Internal jobs
that scan normal user tables, such as import, MV refresh, or internal
insert tasks, can still use normal stats validation and optimization.
github-actions Bot pushed a commit that referenced this pull request May 6, 2026
### What problem does this PR solve?

Related PR: introduced by #41790

Problem Summary:

We met a case where manually dropping column stats tablet files caused
many internal queries against `__internal_schema.column_statistics`,
including queries whose target stats IDs belong to the statistics table
itself. The visible symptom is a self-amplifying internal-query storm
such as:

```sql
SELECT * FROM __internal_schema.column_statistics
WHERE id IN (... column_statistics own column stats ids ...)
```

The problematic call chain is:

```text
ColumnStatisticsCacheLoader.doLoad
  -> StatisticsRepository.loadColStats
  -> StatisticsUtil.execStatisticQuery
  -> StmtExecutor.executeInternalQuery
  -> NereidsPlanner.optimize
  -> InitJoinOrder
  -> StatsCalculator.disableJoinReorderIfStatsInvalid
  -> StatsCalculator.checkNdvValidation
  -> StatisticsCache.OlapTableStatistics.getColumnStatistics
```

When the internal stats-loading SQL scans
`__internal_schema.column_statistics`, join-reorder's pre-validation
path runs before `computeOlapScan()` derives UNKNOWN stats for system
tables. That pre-validation can request column statistics for
`column_statistics` itself. If the stats load fails, the async stats
cache does not retain a useful value for the failed key, so repeated
planning can reload the same system-table stats and amplify the internal
query volume.

This is separate from the audit `State=OK` display issue: those audit
rows can look successful even when the internal stats query failed. The
bug fixed here is the recursive system-table stats lookup during
planning.

This PR returns UNKNOWN from `StatisticsCache.OlapTableStatistics`
column and partition-column accessors for system tables. That keeps the
system-table behavior consistent with `computeOlapScan()` and prevents
future callers through this accessor from accidentally loading
system-table stats before the normal scan-statistics guard.

The fix intentionally does not skip all internal queries. Internal jobs
that scan normal user tables, such as import, MV refresh, or internal
insert tasks, can still use normal stats validation and optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants