[refine](storage) remove mock column workaround in segment iterator common expr evaluation#62561
[refine](storage) remove mock column workaround in segment iterator common expr evaluation#62561Mryange wants to merge 2 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Found 1 blocking issue.
be/src/exprs/vexpr_context.h: this change removes the 5-argexecute_conjuncts_and_filter_block(..., filter)overload, butbe/src/exec/operator/table_function_operator.cpp:718still calls it. Since only the 4-arg overload and the new 6-arg overload remain, the branch no longer builds.
Critical checkpoints:
- Goal of the task: remove the mock-column workaround in
SegmentIteratorby driving common-expression filtering withselected_size. After tracingSegmentIterator -> VExprContext::execute_filter -> VExpr::execute_column, that path looks directionally correct, but the task is not fully accomplished because the helper API change was not propagated to all callers, and there is no test proving the lazy-materialization row-count mismatch case. - Change size/focus: mostly focused, but the shared helper change missed another caller, so the refactor is not self-contained.
- Concurrency: no new concurrency, locking, or lifecycle risks in the touched code.
- Lifecycle/static initialization: no special lifecycle or static initialization concerns introduced.
- Configuration: no new config items.
- Compatibility: no FE/BE protocol or storage-format compatibility change, but there is an internal API compatibility break inside BE that currently leaves the tree uncompilable.
- Parallel code paths: not fully covered.
table_function_operator.cppstill uses the removed 5-arg helper. - Special conditions/checks: the new explicit
num_rowscontract is reasonable for lazy materialization, but it must be applied consistently to every caller. - Test coverage: no new regression or unit test was added for this row-count mismatch path, so there is still no automated proof for the targeted scenario.
- Test result files: none changed.
- Observability: no additional observability appears necessary for this refactor.
- Transaction, persistence, and data-write concerns: not applicable.
- FE-BE variable propagation: not applicable.
- Performance: expected to be neutral or slightly better by removing the temporary mock column.
- Other issues: I did not find another concrete runtime bug in the traced
SegmentIteratorcommon-expression path beyond the compile blocker above.
| std::vector<uint32_t>& columns_to_filter, | ||
| int column_to_keep, IColumn::Filter& filter); | ||
| int column_to_keep, IColumn::Filter& filter, | ||
| size_t num_rows); |
There was a problem hiding this comment.
This signature change removes the previous 5-arg overload, but be/src/exec/operator/table_function_operator.cpp:718 still calls execute_conjuncts_and_filter_block(..., filter) without the new num_rows argument. With only the 4-arg and 6-arg overloads left, this PR no longer builds. Please either update that caller (likely with output_block->rows()) or keep a wrapper overload here.
|
/review |
There was a problem hiding this comment.
No blocking findings.
Critical checkpoints:
- Goal / correctness: The PR removes the mock-column workaround and drives common-expr filtering from an explicit row count. I traced
SegmentIterator::_next_batch_internal -> _process_common_expr -> _execute_common_expr -> VExprContext::execute_conjuncts_and_filter_block -> VExpr::execute_filter; the newselected_sizeplumbing matches the lazy-materialization case, and the table-function caller was updated consistently. - Scope: The change is small and focused to the helper signature, its affected call sites, and the index-result column sizing logic.
- Concurrency: No new concurrent state, locks, or lifecycle interactions are introduced in the modified flow.
- Lifecycle / static init: No special lifecycle or static-initialization risk is introduced here.
- Config / compatibility / FE-BE protocol / persistence / transaction or data-write semantics: Not applicable for this change.
- Parallel code paths: The only non-segment caller (
table_function_operator.cpp) was updated; I did not find any remaining callers of the removed overload. - Conditional logic: Switching the DCHECK and the filter sizing to
selected_sizematches the actual invariant of this path better thanblock->rows(). - Observability: No additional logging or metrics look necessary for this refactor.
- Tests: No dedicated test was added in this PR, so the remaining risk is coverage for the exact
selected_size != block->rows()lazy-materialization path. I did not run build or regression tests as part of review. - Test result changes: None.
Overall summary:
The patch looks correct and focused. I do not see a blocking issue in the current implementation, though a targeted regression or BE unit test for the mismatched selected_size path would improve confidence.
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
TPC-H: Total hot run time: 29508 ms |
TPC-DS: Total hot run time: 172017 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Previously,
_process_common_exprinSegmentIteratorhad a workaround that temporarily replaced column 0 with a const default column ofselected_sizerows. This was necessary becauseexecute_conjuncts_and_filter_blockinternally usedblock->rows()to determine the filter size, but during lazy materialization the block columns could have different row counts than the actualselected_size.Now that
VExpr::execute_filterandVExpr::execute_columnaccept an explicitcount/num_rowsparameter, this mock column trick is no longer needed. The filter size can be driven byselected_sizedirectly.Changes
Removed mock column logic from
_process_common_expr— deleted theneed_mock_col/col0save/restore code and the associated comment.Added new
execute_conjuncts_and_filter_blockoverload with explicitnum_rows— a new 6-argument overload inVExprContextthat usesnum_rowsinstead ofblock->rows()for the filter size. Internally it callsexecute_filterwith the explicit row count.Deleted old 5-argument overload — the previous
execute_conjuncts_and_filter_block(ctxs, block, columns_to_filter, column_to_keep, filter)overload (which usedblock->rows()) was removed. Its only external caller (table_function_operator.cpp) was migrated to the new 6-arg overload withoutput_block->rows().Replaced
block->rows()withselect_sizein_output_index_result_column— the early return check,vec_match_pred.resize,null_map_vec.resize, and DCHECK all now use theselect_sizeparameter instead ofblock->rows().Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)