[fix](compcation) fix mow compaction delete bitmap handling#63167
[fix](compcation) fix mow compaction delete bitmap handling#63167mrhhsg wants to merge 1 commit into
Conversation
|
run buildall |
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
b48cab6 to
1c13941
Compare
|
/review |
There was a problem hiding this comment.
Review result: request changes.
Critical checkpoints:
- Correctness: one regression test change filters out an inserted row from the pre/post compaction assertions, which weakens coverage and can hide the variant compaction behavior that this suite is meant to validate.
- Concurrency/locking: the storage-side delete bitmap changes were reviewed for the modified paths; no additional blocking concurrency issue was identified.
- Error handling: no new unchecked error path was identified in the modified storage code.
- Tests: the MOW delete-bitmap regression coverage is relevant, but the unrelated variant test expectation change should be fixed or justified without dropping the row from the assertion.
Existing review context: no existing inline review threads were present. User focus: no additional user-provided focus points. Repository instructions: no AGENTS.md or repository-local code-review skill file was found within the current directory, so this review followed the explicit PR review prompt and standard code-review criteria.
| qt_sql_2 "select k, cast(v['a'] as array<int>) from ${tableName} where size(cast(v['a'] as array<int>)) > 0 order by k" | ||
| qt_sql_3 "select k, v['a'], cast(v['b'] as string) from ${tableName} where length(cast(v['b'] as string)) > 4 order by k" | ||
| qt_sql_3 "select k, v['a'], cast(v['b'] as string) from ${tableName} where length(cast(v['b'] as string)) > 4 and k != 1029 order by k" | ||
| qt_sql_5 "select cast(v['b'] as string), cast(v['b']['c'] as string) from ${tableName} where cast(v['b'] as string) != 'null' and cast(v['b'] as string) != '{}' order by k desc, 1, 2 limit 10;" |
There was a problem hiding this comment.
This change drops the inserted k = 1029 row from both the before- and after-compaction checks instead of preserving the expected behavior. The same row is still inserted above and still appears in the full qt_sql_1/qt_sql_11 outputs, so filtering it here weakens this regression test and can hide a compaction/variant extraction regression for that key. Since this PR does not change variant query semantics, please keep the original predicate (or add a separate, justified test case) rather than excluding this row from the assertion.
4c36c05 to
7d5d887
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
Requesting changes. I found a blocking correctness issue in the core MOW delete-bitmap path. Critical checkpoints: correctness is not satisfied because cluster-key same-segment duplicates can keep the wrong row; regression coverage is incomplete because the new regression does not cover same-segment duplicate ordering; diff hygiene is not satisfied due to a trailing-whitespace failure from git diff --check. Existing review context: the variant sparse-limit assertion weakening is already covered by the existing inline thread, so I did not duplicate it. User focus: no additional user-provided focus points were supplied.
| } | ||
| _last_key = cur_key.to_string(); | ||
| if (_rowid_length > 0) { | ||
| // For cluster-key MOW tables, the primary key index contains rowid and |
There was a problem hiding this comment.
This fixes the seek skip, but it now preserves the first rowid encountered in a segment and deletes later rows with the same unique key. For cluster-key MOW the rowid suffix is encoded ascending (SegmentWriter::_encode_rowid), so duplicates within one segment are visited from smaller to larger rowid. When _last_key matches on the second duplicate, calculate_one() returns that later row as deleted, leaving the older row visible. The intended MOW behavior is to keep the newest row (larger rowid within the same segment, absent a higher sequence value), so this can still return stale data for exactly the same same-segment duplicate case this PR is trying to fix.
| -- !audit_log_schema -- | ||
| query_id varchar(48) Yes true \N | ||
| query_id varchar(128) Yes true \N | ||
| time datetime(3) Yes true \N |
There was a problem hiding this comment.
This line introduces trailing whitespace, and git diff --check reports it as an error. Please regenerate or trim the expected output so whitespace checks pass.
What problem does this PR solve?
Fix incorrect delete bitmap handling for cluster-key unique MOW tables during cumulative compaction.
Problem
test_compact_with_seq2can fail under fuzzy mode when cumulative compaction produces a rowset like[2-11]while a newer visible rowset such as[12-12]remains outside the compaction output. In that shape, duplicate unique keys canremain visible.
There are two related gaps:
rowset.
Changes
MergeIndexDeleteBitmapCalculatorwhen rowid is part of the primary key index, so in-segment duplicate unique keys are not skipped.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)