Skip to content

branch-4.0: [fix](fe) Return unknown stats for system tables #62913#63009

Open
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-62913-branch-4.0
Open

branch-4.0: [fix](fe) Return unknown stats for system tables #62913#63009
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-62913-branch-4.0

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented May 6, 2026

Cherry-picked from #62913

### What problem does this PR solve?

Related PR: introduced by #41790

Problem Summary:

We met a case where manually dropping column stats tablet files caused
many internal queries against `__internal_schema.column_statistics`,
including queries whose target stats IDs belong to the statistics table
itself. The visible symptom is a self-amplifying internal-query storm
such as:

```sql
SELECT * FROM __internal_schema.column_statistics
WHERE id IN (... column_statistics own column stats ids ...)
```

The problematic call chain is:

```text
ColumnStatisticsCacheLoader.doLoad
  -> StatisticsRepository.loadColStats
  -> StatisticsUtil.execStatisticQuery
  -> StmtExecutor.executeInternalQuery
  -> NereidsPlanner.optimize
  -> InitJoinOrder
  -> StatsCalculator.disableJoinReorderIfStatsInvalid
  -> StatsCalculator.checkNdvValidation
  -> StatisticsCache.OlapTableStatistics.getColumnStatistics
```

When the internal stats-loading SQL scans
`__internal_schema.column_statistics`, join-reorder's pre-validation
path runs before `computeOlapScan()` derives UNKNOWN stats for system
tables. That pre-validation can request column statistics for
`column_statistics` itself. If the stats load fails, the async stats
cache does not retain a useful value for the failed key, so repeated
planning can reload the same system-table stats and amplify the internal
query volume.

This is separate from the audit `State=OK` display issue: those audit
rows can look successful even when the internal stats query failed. The
bug fixed here is the recursive system-table stats lookup during
planning.

This PR returns UNKNOWN from `StatisticsCache.OlapTableStatistics`
column and partition-column accessors for system tables. That keeps the
system-table behavior consistent with `computeOlapScan()` and prevents
future callers through this accessor from accidentally loading
system-table stats before the normal scan-statistics guard.

The fix intentionally does not skip all internal queries. Internal jobs
that scan normal user tables, such as import, MV refresh, or internal
insert tasks, can still use normal stats validation and optimization.
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants