[refine](column) use concrete ColumnUInt8 for vertical delete filter#63652
[refine](column) use concrete ColumnUInt8 for vertical delete filter#63652Mryange wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found one performance regression in the storage compaction hot path. The change is small and focused, and it appears to preserve correctness, but it now allocates a new delete-filter column for every filtered block instead of reusing the existing column buffer.
Critical checkpoints: Goal: the PR aims to make the delete filter use a concrete ColumnUInt8; it mostly accomplishes that, but with an avoidable allocation regression. Scope: the code change is small and focused. Concurrency/lifecycle: no new shared state, locks, thread interactions, or special lifecycle concerns identified. Configuration/compatibility/protocol: none added or changed. Parallel paths: BlockReader still uses the old reuse pattern; this PR only changes VerticalBlockReader, and the divergence is the source of the issue. Tests: no new tests were added; this is a refactor, but no benchmark or compaction test result is provided to cover the hot path impact. Transaction/persistence/data visibility/MoW: no direct metadata, visible-version, delete-bitmap, or persistence behavior changes found. Observability: no new observability required. User focus: no additional user-provided review focus was specified.
| block->insert({std::move(_delete_filter_column), std::make_shared<DataTypeUInt8>(), | ||
| "__DORIS_COMPACTION_FILTER__"}); | ||
| auto filter_status = Block::filter_block(block, column_to_keep, column_to_keep); | ||
| _delete_filter_column = ColumnUInt8::create(); |
There was a problem hiding this comment.
This recreates the filter column after every block, so vertical compaction now allocates a fresh ColumnUInt8 buffer for each batch. The old code kept _delete_filter_column alive after filter_block() erased the temporary block column, so subsequent batches could reuse the existing capacity via resize(block_rows). This path runs once per block during unique-key vertical compaction, and the new per-batch allocation is avoidable. Please keep the concrete type but preserve buffer reuse, for example by inserting an immutable reference to the existing column for filtering and retaining _delete_filter_column after filter_block() removes the temporary column.
7151252 to
02e9567
Compare
|
/review |
There was a problem hiding this comment.
Review result: no additional blocking issues found in the live PR diff. The current changes keep the reusable ColumnUInt8 buffer alive on the success path while inserting a temporary immutable reference for filtering, which addresses the already-known allocation concern from the existing review thread. I did not find new data correctness, lifecycle, concurrency, compatibility, or storage-format issues in these two files. Test coverage is still not demonstrated in this diff, so please ensure the relevant vertical compaction path is covered by existing or added tests. User focus: no additional user-provided review focus was specified.
|
run buildall |
TPC-H: Total hot run time: 31812 ms |
TPC-DS: Total hot run time: 172306 ms |
|
run external |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Problem Summary:
The vertical block reader kept the delete filter column as the base
ColumnPtrtype even though the column is always created asColumnUInt8and later mutated as a byte filter column. Root cause:_delete_filter_columnused the generic column pointer type, so the reader had to callIColumn::mutateand reinterpret the result asColumnUInt8before resizing and filling the filter data. This PR changes the member toColumnUInt8::MutablePtr, directly operates on the concrete filter column, and moves it into the temporary block filter column when applyingBlock::filter_block. The filter column is recreated after filtering so the reader state remains valid on both success and error paths.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)