Fix several optimizations after lightweight deletes#101212
Fix several optimizations after lightweight deletes#101212alesapin merged 5 commits intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [0f9947f] Summary: ✅ AI ReviewSummaryThis PR fixes lightweight-delete-aware optimizer gating by replacing a sticky boolean with a live counter ( Missing context
Findings❌ Blockers
ClickHouse Rules
Performance & SafetyThe issue is safety/correctness-related: lock-free reads of Final Verdict
|
…-free readers Reorder counter increments before decrements so that `hasLightweightDeletedMask` (which reads the counter with `memory_order_relaxed`) never sees a transient false-negative. Previously, covered parts' counters were decremented before the new part's counter was incremented, creating a window where the counter could be zero while an active masked part already existed. ClickHouse#101212 (comment) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
018be70 to
4cd012b
Compare
…lete` counter In `removePartsFromWorkingSet` and `forcefullyMovePartToDetachedAndRemoveFromMemory`, the counter was decremented while the part was still in `Active` state, creating a false-negative window for lock-free readers of `hasLightweightDeletedMask`. In `swapActivePart`, the counter was incremented after the part was set to `Active`, creating the same kind of false-negative window. Fix: transition state before decrementing counters (deactivation paths) and increment counters before transitioning to `Active` (activation paths). Also add `SELECT count()` assertions to the test to verify that trivial count optimization is also correctly disabled/re-enabled across the lightweight delete lifecycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
The test |
| { | ||
| addPartContributionToDataVolume(res.part); | ||
| addPartContributionToUncompressedBytesInPatches(res.part); | ||
| addPartContributionToTableCounters(res.part); |
There was a problem hiding this comment.
hasLightweightDeletedMask is now used as a correctness gate for minmax_count_projection / trivial COUNT(*), but loadDataPart still makes an active part visible before updating total_parts_with_lightweight_delete.
In this function, res.part->setState(to_state) runs before insertion, so an Active part with a lightweight-delete mask can be observable while the counter is still 0 until addPartContributionToTableCounters runs. That is a transient false-negative for lock-free readers and can re-enable these optimizations too early.
Please keep activation monotonic for lock-free readers on this path as well (no temporary false-negatives), e.g. by ensuring the counter reflects the part before it can be observed as Active, or by validating under readLockParts when the fast-path counter says 0.
LLVM Coverage Report
Changed lines: 94.77% (145/153) | lost baseline coverage: 100 line(s) · Uncovered code |
|
It introduces race condition for these rows https://github.com/ClickHouse/ClickHouse/pull/101212/files#diff-b7d33a7abda5f3365d5e4fb58ca98b756f6401372b2e2d5eb449a2bd3444fb5eR8576 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix
minmax_count_projectionand trivialCOUNT(*)optimizations being permanently disabled after a lightweight delete, even after all parts with a mask of lightweight delete were merged away.