[fix](regression-test) stabilize 2 muted external_table_p0 tests#63646
Open
morningman wants to merge 1 commit into
Open
[fix](regression-test) stabilize 2 muted external_table_p0 tests#63646morningman wants to merge 1 commit into
morningman wants to merge 1 commit into
Conversation
test_file_cache_query_limit:
After `POST /api/file_cache?op=clear&sync=true` the test waited one
file_cache_background_monitor_interval_ms window and then asserted
normal_queue_curr_size == 0 once. The counters surfaced by
information_schema.file_cache_statistics are republished by the
background monitor on its own cadence, so a single fixed-time wait
races the refresh and the assert fails roughly half the time even when
the cache really is empty.
Replace the four wait-then-assert blocks with Awaitility-based polling
(already imported) on the relevant metric until the predicate holds,
with a max(30s, 6 x monitor_interval) timeout. The original assertFalse
calls with their metric-specific messages are kept as the final guard,
so real failures still surface a precise reason. The two waits for BE
config propagation are left untouched.
test_hive_query_cache:
The `test { sql ...; time 20000 }` block ran TPC-H Q9 against
containerized hive parquet with enable_sql_cache=false set above, so
the 20s upper bound was timing a cold join, not a cache hit. The query
routinely exceeds 20s under cluster load, which explains the ~20-25%
flake rate. Drop the time guard; the qt_ above already validates
correctness, and the cache behavior is exercised in the subsequent
blocks that explicitly enable sql cache.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Both tests have been muted on the External Regression pipeline due to long-standing flakiness (analysis based on TeamCity build #92687 / id 953050). Neither is a real product bug — both are test-side robustness issues.
test_file_cache_query_limit(~50% pass rate)After
POST /api/file_cache?op=clear&sync=truethe test waited exactly onefile_cache_background_monitor_interval_mswindow and then assertednormal_queue_curr_size == 0once. The counters surfaced byinformation_schema.file_cache_statisticsare republished by the background monitor on its own cadence, so a single fixed-time wait races the refresh and the assert fails roughly half the time even when the cache really is empty.size == 0after clear,size > 0after a query) withAwaitility-based polling (already imported) on the relevant metric until the predicate holds, with amax(30s, 6 × monitor_interval)timeout.assertFalse(...)calls with their metric-specific messages are kept as the final guard, so real failures still surface a precise reason.enable_file_cache_query_limitflip) are left untouched — not in the failure path.test_hive_query_cache(~20–25% fail rate)The
test { sql ...; time 20000 }block at L122 ran TPC-H Q9 against containerized hive parquet withenable_sql_cache=falseset above, so the 20s upper bound was timing a cold 6-table join, not a cache hit. The query routinely exceeds 20s under cluster load.qt_tpch_1sf_q09above already validates correctness, and the cache behavior is exercised in the subsequent blocks that explicitly enable sql cache.Test plan
🤖 Generated with Claude Code