[fix](fe) set cloud version_cache_ttl to 0 temporarily if retry a query with -230#63721
[fix](fe) set cloud version_cache_ttl to 0 temporarily if retry a query with -230#63721mymeiyi wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR updates FE query retry behavior in cloud mode to temporarily disable cloud partition/table version caching when retrying after an E-230 error, and adds regression/unit tests to validate the behavior and session variable restoration.
Changes:
- In
StmtExecutor.queryRetry, detect E-230 retry conditions and setcloud_*_version_cache_ttl_msto0for the next retry attempt (viasetVarOnce), relying on existing per-execution session var reversion. - Add FE unit tests covering the E-230 detection logic and the temporary TTL override + revert path.
- Extend the cloud regression test to validate that session TTLs remain unchanged after a retry sequence triggered by injected E-230.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| regression-test/suites/cloud_p0/query_retry/test_retry_e-230.groovy | Adds a regression scenario verifying TTL values are restored after an E-230-triggered retry. |
| fe/fe-core/src/test/java/org/apache/doris/qe/StmtExecutorTest.java | Adds unit tests for E-230 detection and temporary TTL override + revert behavior. |
| fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java | Implements cloud-mode E-230 retry hook to temporarily disable version caching for the next attempt. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/review |
There was a problem hiding this comment.
Review summary:
No additional blocking issues found in the PR diff beyond the existing inline threads about the unconditional TTL log message and the typoed test future variable names.
Critical checkpoint conclusions:
- Goal and proof: The PR targets E-230 retry failures caused by stale cloud version caches by temporarily forcing table/partition version TTLs to 0 for the next retry. The added unit tests cover the decision and revert path, and the cloud regression test covers retry success with restored session TTLs.
- Scope and clarity: The production change is small and localized to StmtExecutor retry handling.
- Concurrency: No new shared cross-session mutable state is introduced. The session variable changes are scoped to the current ConnectContext and restored through the existing execute() finally path.
- Lifecycle: No new static initialization or object lifecycle concerns found. Config.cloud_unique_id changes in the unit test are restored in finally.
- Configuration: No new config/session variables are added; existing cloud TTL session variables are temporarily overridden through existing SET_VAR-style machinery.
- Compatibility: No FE-BE protocol, storage format, or persistence compatibility change found.
- Parallel paths: The change applies to StmtExecutor query retry; async MTMV retry has separate existing coverage and was not modified here.
- Conditional checks: The E-230/cloud-mode/TTL condition is consistent with the target failure mode.
- Tests and results: Added tests are relevant. Existing review already covers a minor test readability typo.
- Observability: Existing review already covers the misleading INFO log when setVarOnce fails; no additional observability blocker found.
- Transactions/persistence/data writes: No new persistence path is introduced. For query retry, successful execution observes freshly fetched cloud versions while original session TTLs are restored afterward.
- Performance: The extra work is only on retry after E-230 and consists of two session variable updates before replanning, so no hot-path performance issue found.
User focus: No additional user-provided review focus was supplied.
|
/review |
There was a problem hiding this comment.
I found one correctness issue in the retry path. The change handles normal session TTLs, but SQL-level SET_VAR hints can still overwrite the temporary zero before planning reads cloud visible versions.
Critical checkpoint conclusions:
- Goal/test: the PR targets E-230 retries with stale cloud version cache and adds unit/regression coverage, but the SET_VAR hint path is not covered and can bypass the fix.
- Scope: the implementation is small and focused.
- Concurrency/lifecycle: no new shared mutable state, locks, or special lifecycle concerns found.
- Config/compatibility: no new configuration or incompatible storage/protocol change.
- Parallel paths: normal session variables are covered, but the SQL-level SET_VAR path is a parallel session-variable mutation path and needs handling.
- Tests: added tests cover plain session TTL restore; they do not cover SET_VAR overriding the retry TTL.
- Observability/performance/transactions: no additional blocking issue found beyond the existing known INFO-log review thread.
User focus: no additional user-provided review focus was specified.
|
run buildall |
TPC-H: Total hot run time: 31011 ms |
TPC-DS: Total hot run time: 171852 ms |
FE Regression Coverage ReportIncrement line coverage |
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
If a query get E-230 error and
cloud_partition_version_cache_ttl_msis not set to 0, this pr set the session var to 0 temporarily to get the newest version.