[fix](regression) handle cumulative delete-version compaction wait#64945
[fix](regression) handle cumulative delete-version compaction wait#64945shuke987 wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
7812dbb to
ee175b9
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
Reviewed PR #64945. I did not find a blocking issue or a valuable inline comment to add.
Critical checkpoint conclusions:
- Goal and proof: The change addresses the regression-helper wait path for cumulative compaction that reports E-2010 after meeting a delete version. The new exit condition requires the E-2010 status, cumulative point advancement, and a changed base success timestamp, while still preserving the existing
run_statuswait. - Scope: The PR is small and focused; the only changed file is
regression-test/plugins/plugin_compaction.groovy. - Concurrency/lifecycle: No production concurrency or lifecycle logic is changed. The helper continues to poll BE state and exits only after the tablet is no longer reporting a running compaction plus the observed delete-version handoff evidence.
- Config/session/protocol compatibility: No new config, session variable, FE/BE protocol field, or expected-output format is introduced. The status fields read by the helper are already exposed by local and cloud tablet compaction status JSON.
- Parallel paths: I checked the local and cloud compaction HTTP/status paths. Both expose
cumulative point, cumulative/base status strings, and compaction timestamps used by this helper. - Tests/results: No
.outfile is changed. I did not run a local regression suite in this runner, but GitHub reports successful P0 Regression, Cloud P0, NonConcurrent Regression, compile, FE UT, BE UT, and Cloud UT checks for the PR head. - Performance/observability: The polling interval and logging behavior are unchanged; the added status checks are local parsing/comparison work inside the existing wait loop.
User focus: No additional user-provided review focus was supplied.
Subagent conclusions:
optimizer-rewrite: no candidate issue; the helper-only patch has no optimizer/rewrite, join, aggregate, or Nereids semantic surface.tests-session-config: no candidate issue; no session/config propagation, expected-output, compatibility, or CI/style concern remained after static review.- Convergence: Round 1 ended with both live subagents replying
NO_NEW_VALUABLE_FINDINGSfor the same ledger/comment set after the main merged dismissals and empty proposed inline comment set were recorded.
| def success_time_unchanged = (oldStatus["last ${compaction_type} success time"] == tabletStatus["last ${compaction_type} success time"]) | ||
| def failure_time_unchanged = (oldStatus["last ${compaction_type} failure time"] == tabletStatus["last ${compaction_type} failure time"]) | ||
| running = running || (success_time_unchanged && failure_time_unchanged) | ||
| running = running || ((success_time_unchanged && failure_time_unchanged) && !completedByBaseCompactionAfterDeleteVersion) |
There was a problem hiding this comment.
When e-2010 advances cumu point, but base compaction hasn't run, the condition would be failure_time_unchanged=false, so running=false, but we expect running=true in this case.
There was a problem hiding this comment.
Good catch. Updated the wait condition so once E-2010 advances the cumulative point, the helper treats it as a delete-version handoff and keeps waiting until last base success time changes. A cumulative failure timestamp change alone no longer lets this path finish.
ee175b9 to
5a6d9d1
Compare
|
run buildall |
|
PR approved by anyone and no changes requested. |
fix handle cumulative delete-version compaction wait
Summary
Fix
trigger_and_wait_compactionso Cloud cumulative compaction that meets a delete version does not wait until the 300s timeout after valid progress has already happened.When cumulative compaction meets a delete version, BE can return
[E-2010] cumulative compaction meet delete version, advance the cumulative point, and let base compaction handle the rowsets. In that path the cumulative success/failure timestamps may not change, so the old helper kept polling even after base compaction had completed andrun_status=false.This patch treats
E-2010plus cumulative point advancement plus a changed base success time as an equivalent completed cumulative delete-version path while still waiting whenrun_status=true. IfE-2010advances the cumulative point but base success time has not changed yet, the helper keeps waiting even if the cumulative failure timestamp changed.Root Cause
The case
compaction/test_compacation_with_delete.groovycreates alternating data and delete rowsets, then callstrigger_and_wait_compaction(tableName, "cumulative").In Cloud mode this can legally follow:
E-2010The helper only watched cumulative success/failure timestamp changes. In the failing log, base compaction completed in 448 ms, but the helper waited for 5 minutes because the cumulative timestamps did not change.
Validation
git diff --checkgit diff --check origin/master..HEADE-2010 + cumulative point advanced + base success time changed + run_status=falseexits waitE-2010 + cumulative point advancedkeeps waiting if base success time has not changedE-2010 + cumulative point advanced + cumulative failure time changedstill keeps waiting if base success time has not changedrun_status=truekeeps waiting even if the delete-version/base-success condition is metCloud P0 rerun is still needed for final validation.