[feature](search) support MATCH projection as virtual column for inverted index evaluation#61092
[feature](search) support MATCH projection as virtual column for inverted index evaluation#61092airborne12 wants to merge 9 commits intoapache:masterfrom
Conversation
…verted index Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…column projections Wire up virtual column MATCH expressions with inverted index evaluation in segment_iterator so that MATCH projections (pushed down via PreferPushDownProject) can leverage the fast index path instead of slow-path expression evaluation. Changes: - Match.java: add PreferPushDownProject interface - segment_iterator.cpp: set IndexExecContext on virtual column exprs, evaluate inverted index for virtual column MATCH, and convert result bitmaps to UInt8 columns for fast_execute() cache Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests MATCH expressions as projections (not filters) pushed down as virtual columns on OlapScan, evaluated via inverted index. Covers: - Simple MATCH projection - MATCH projection with FULL OUTER JOIN - Multiple MATCH projections - MATCH projection with additional filter - MATCH_PHRASE projection - Regression check that MATCH filter still works - MATCH filter with INNER JOIN - EXPLAIN output verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 27635 ms |
TPC-DS: Total hot run time: 153480 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
…n projections The critical bug: bitmap→column conversion in _output_index_result_column was only called from _process_common_expr, gated by _is_need_expr_eval. In FULL OUTER JOIN + projection-only (no WHERE), _is_need_expr_eval is false, so conversion never ran and fast_execute() fell back to slow path. Fix: refactor _output_index_result_column_for_expr into a generic _output_index_result_column(vector<VExprContext*>, ...) and call it in step5 of _next_batch_internal for virtual column exprs, before _materialization_of_virtual_column, independent of _is_need_expr_eval. Also: - FE: add Project→Filter→OlapScan pattern to the rewrite rule - FE: add unit tests for PushDownMatchProjectionAsVirtualColumn - Regression: enhance tests with no-index, MOW UNIQUE, compound MATCH, and direct filter edge cases (13 total test cases) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run buildall |
TPC-H: Total hot run time: 27766 ms |
TPC-DS: Total hot run time: 152928 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
zhiqiang-hhhh
left a comment
There was a problem hiding this comment.
/**
* MATCH expressions in FULL OUTER JOIN projections optimization:
*
* Before optimization (brute-force approach):
* - Execute FULL OUTER JOIN ON A.k1 = B.k1 first
* └── Get complete join result set (all rows)
* - Perform projections on each row of join result:
* ├── Projection 1: A.k1 (simple column read)
* └── Projection 2: A.content MATCH_ANY 'hello' (complex expression)
* └── Evaluate MATCH for every row (no index, brute-force computation)
*
* After optimization (virtual column approach):
* - Pre-compute MATCH at OlapScan layer as virtual column
* └── Leverage inverted index for fast evaluation
* └── Cache result in IndexExecContext
* - Execute FULL OUTER JOIN ON A.k1 = B.k1
* └── A side already has pre-computed virtual column result
* - Perform projections
* ├── Projection 1: A.k1
* └── Projection 2: Read cached virtual column result directly
*
* Key insight: JOIN semantics forbid filtering, but don't prevent
* pre-computing with index and caching results for downstream use.
*/
|
PR approved by anyone and no changes requested. |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
...main/java/org/apache/doris/nereids/rules/rewrite/PushDownMatchProjectionAsVirtualColumn.java
Outdated
Show resolved
Hide resolved
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
1824aa5 to
b3db5dc
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
…rojection 1. Add null check for getTableProperty() to prevent NPE when table property is not set. 2. Remove the SlotReference child check — Match expressions can have non-SlotReference children and should still be pushed down. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ite rules 1. BE: Clone virtual column exprs before set_index_context() to avoid cross-segment context corruption. IndexExecContext holds segment-specific index iterator references that would be overwritten on shared VExprContext. 2. FE: Add appendVirtualColumns/appendVirtualColumnsAndTopN to LogicalOlapScan. Multiple rewrite rules (CSE, MATCH, Score, Vector) can now coexist by appending virtual columns instead of replacing. Remove virtualColumns.isEmpty() guard from PushDownMatchProjectionAsVirtualColumn. 3. Tests: Strengthen PushDownMatchProjectionAsVirtualColumnTest (3→6 tests) with fine-grained assertions: alias name preservation, slot replacement correctness, duplicate MATCH dedup, and append-to-existing verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b3db5dc to
eb6f1f8
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 27639 ms |
score() pushed by PushDownScoreTopNIntoOlapScan has no children but reaches evaluate_inverted_index(). Replace DCHECK_GE with graceful early return to avoid crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run buildall |
TPC-DS: Total hot run time: 153786 ms |
TPC-H: Total hot run time: 27571 ms |
TPC-DS: Total hot run time: 152968 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
In FULL OUTER JOIN queries, MATCH expressions in the SELECT list cannot be pushed down as filters (this would violate join semantics by incorrectly filtering rows). This means the inverted index cannot be used for MATCH evaluation, resulting in slow-path expression evaluation.
This PR enables MATCH expressions used as projections to be pushed down as virtual columns on OlapScan, allowing the BE to evaluate them via inverted index using the existing
fast_execute()caching mechanism.Example:
FE changes:
Match.java: AddPreferPushDownProjectinterface soPushDownProjectrule moves MATCH from join output into scan projectionsPushDownMatchProjectionAsVirtualColumn.java: New rewrite rule converting MATCH projections to virtual columns on OlapScanRuleType.java+Rewriter.java: Rule registrationBE changes (segment_iterator.cpp):
_construct_compound_expr_context(): Set sharedIndexExecContexton virtual column exprs_apply_index_expr(): Evaluate inverted index for virtual column MATCH (bitmap only, no row filtering)_output_index_result_column_for_expr(): Convert bitmap to UInt8 column for all index contexts (common exprs + virtual column exprs)The bitmap result is cached in
IndexExecContext, and when_materialization_of_virtual_column()callsVirtualSlotRef::execute_column()→ MATCH'sfast_execute(), it returns the pre-computed column directly.Release note
Support MATCH expressions as projections pushed down to OlapScan as virtual columns, enabling inverted index evaluation for MATCH in contexts where it cannot be pushed as a filter (e.g., FULL OUTER JOIN).
Check List (For Author)
Test
virtualColumn=id MATCH_ANY 'hello'Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)