[fix](regression-test) stabilize 2 muted external_table_p0 tests by morningman · Pull Request #63646 · apache/doris

morningman · 2026-05-25T19:23:15Z

Summary

Both tests have been muted on the External Regression pipeline due to long-standing flakiness (analysis based on TeamCity build #92687 / id 953050). Neither is a real product bug — both are test-side robustness issues.

`test_file_cache_query_limit` (~50% pass rate)

After POST /api/file_cache?op=clear&sync=true the test waited exactly one file_cache_background_monitor_interval_ms window and then asserted normal_queue_curr_size == 0 once. The counters surfaced by information_schema.file_cache_statistics are republished by the background monitor on its own cadence, so a single fixed-time wait races the refresh and the assert fails roughly half the time even when the cache really is empty.

Replace the four wait-then-assert blocks (size == 0 after clear, size > 0 after a query) with Awaitility-based polling (already imported) on the relevant metric until the predicate holds, with a max(30s, 6 × monitor_interval) timeout.
The original assertFalse(...) calls with their metric-specific messages are kept as the final guard, so real failures still surface a precise reason.
The two waits for BE config propagation (enable_file_cache_query_limit flip) are left untouched — not in the failure path.

`test_hive_query_cache` (~20–25% fail rate)

The test { sql ...; time 20000 } block at L122 ran TPC-H Q9 against containerized hive parquet with enable_sql_cache=false set above, so the 20s upper bound was timing a cold 6-table join, not a cache hit. The query routinely exceeds 20s under cluster load.

Drop the time guard; the qt_tpch_1sf_q09 above already validates correctness, and the cache behavior is exercised in the subsequent blocks that explicitly enable sql cache.

Test plan

Run External Regression pipeline on this PR and confirm both cases pass.
After 5+ consecutive green runs, follow up to unmute these cases in TeamCity.

🤖 Generated with Claude Code

test_file_cache_query_limit: After `POST /api/file_cache?op=clear&sync=true` the test waited one file_cache_background_monitor_interval_ms window and then asserted normal_queue_curr_size == 0 once. The counters surfaced by information_schema.file_cache_statistics are republished by the background monitor on its own cadence, so a single fixed-time wait races the refresh and the assert fails roughly half the time even when the cache really is empty. Replace the four wait-then-assert blocks with Awaitility-based polling (already imported) on the relevant metric until the predicate holds, with a max(30s, 6 x monitor_interval) timeout. The original assertFalse calls with their metric-specific messages are kept as the final guard, so real failures still surface a precise reason. The two waits for BE config propagation are left untouched. test_hive_query_cache: The `test { sql ...; time 20000 }` block ran TPC-H Q9 against containerized hive parquet with enable_sql_cache=false set above, so the 20s upper bound was timing a cold join, not a cache hit. The query routinely exceeds 20s under cluster load, which explains the ~20-25% flake rate. Drop the time guard; the qt_ above already validates correctness, and the cache behavior is exercised in the subsequent blocks that explicitly enable sql cache. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

hello-stephen · 2026-05-25T19:23:20Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

morningman · 2026-05-25T19:41:51Z

run buildall

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](regression-test) stabilize 2 muted external_table_p0 tests#63646

[fix](regression-test) stabilize 2 muted external_table_p0 tests#63646
morningman wants to merge 1 commit into
apache:masterfrom
morningman:fix-external-p0-20260525

morningman commented May 25, 2026

Uh oh!

hello-stephen commented May 25, 2026

Uh oh!

morningman commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

morningman commented May 25, 2026

Summary

test_file_cache_query_limit (~50% pass rate)

test_hive_query_cache (~20–25% fail rate)

Test plan

Uh oh!

hello-stephen commented May 25, 2026

Uh oh!

morningman commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`test_file_cache_query_limit` (~50% pass rate)

`test_hive_query_cache` (~20–25% fail rate)