Skip to content

branch-4.0: [fix](regression-test) stabilize 2 muted external_table_p0 tests #63646#63746

Open
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-63646-branch-4.0
Open

branch-4.0: [fix](regression-test) stabilize 2 muted external_table_p0 tests #63646#63746
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-63646-branch-4.0

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #63646

)

## Summary

Both tests have been muted on the External Regression pipeline due to
long-standing flakiness (analysis based on TeamCity build #92687 / id
953050). Neither is a real product bug — both are test-side robustness
issues.

### `test_file_cache_query_limit` (~50% pass rate)

After `POST /api/file_cache?op=clear&sync=true` the test waited exactly
one `file_cache_background_monitor_interval_ms` window and then asserted
`normal_queue_curr_size == 0` once. The counters surfaced by
`information_schema.file_cache_statistics` are republished by the
background monitor on its own cadence, so a single fixed-time wait races
the refresh and the assert fails roughly half the time even when the
cache really is empty.

- Replace the four wait-then-assert blocks (`size == 0` after clear,
`size > 0` after a query) with `Awaitility`-based polling (already
imported) on the relevant metric until the predicate holds, with a
`max(30s, 6 × monitor_interval)` timeout.
- The original `assertFalse(...)` calls with their metric-specific
messages are kept as the final guard, so real failures still surface a
precise reason.
- The two waits for BE config propagation
(`enable_file_cache_query_limit` flip) are left untouched — not in the
failure path.

### `test_hive_query_cache` (~20–25% fail rate)

The `test { sql ...; time 20000 }` block at L122 ran TPC-H Q9 against
containerized hive parquet with `enable_sql_cache=false` set above, so
the 20s upper bound was timing a cold 6-table join, not a cache hit. The
query routinely exceeds 20s under cluster load.

- Drop the time guard; the `qt_tpch_1sf_q09` above already validates
correctness, and the cache behavior is exercised in the subsequent
blocks that explicitly enable sql cache.

## Test plan

- [ ] Run External Regression pipeline on this PR and confirm both cases
pass.
- [ ] After 5+ consecutive green runs, follow up to unmute these cases
in TeamCity.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants