branch-4.1: [fix](inverted index) resolve variant sub-column indexes for score() #62992#63078
Merged
yiguolei merged 1 commit intobranch-4.1from May 9, 2026
Merged
Conversation
…62992) ### What problem does this PR solve? Issue Number: N/A Related PR: Problem Summary: Fix `score()` query failing on variant sub-columns with: ```text Index statistics collection failed: Score query is not supported without inverted index for column=<variant.subcolumn> ``` `MatchPredicateCollector::collect` previously used `TabletSchema::inverted_indexs(int32_t col_unique_id, const std::string& suffix_path)`, which only consults `_col_id_suffix_to_index`. Variant sub-column indexes can also live in: 1. `_path_set_info_map` (`subcolumn_indexes` / `typed_path_set`), when the parent variant index is inherited by sub-columns or a typed path index is materialized. 2. `_index_by_unique_id_with_pattern`, when the inverted index is created on the parent variant column with `PROPERTIES("field_pattern"="...")`. This PR fixes both paths: - Use the column-aware `TabletSchema::inverted_indexs(const TabletColumn&)` lookup so inherited / typed-path variant sub-column indexes are resolved. - Reuse the parent variant sub-column pattern matching logic from `variant_util` to resolve `MATCH_NAME` / `MATCH_NAME_GLOB` templates, then look up `inverted_index_by_field_pattern(parent_uid, matched_pattern)`. - Clone matched field-pattern index metadata and set the actual Lucene field suffix so score collection uses keys such as `<parent_uid>.<parent_col>.user.name`, while `CollectInfo::owned_index_meta` keeps the cloned metadata alive. Behaviour after this PR: | Scenario | Behaviour | | --- | --- | | Variant sub-column index is inherited or materialized in `_path_set_info_map` | Schema lookup succeeds through `inverted_indexs(const TabletColumn&)`; score collection uses the real sub-column Lucene field. | | Parent variant index uses exact `field_pattern`, e.g. `host` | Pattern lookup resolves the index and score collection uses `<parent_uid>.<parent_col>.host`. | | Parent variant index uses glob `field_pattern`, e.g. `user.*`, and slot path is `user.name` | Parent template matching returns `user.*`; index lookup uses the matched pattern; score collection uses `<parent_uid>.<parent_col>.user.name`. | | No index matches the sub-column | Existing unsupported-index error is preserved; BM25 score still requires an inverted index. | ### Release note Fix `score()` queries on variant sub-columns whose inverted index is inherited, typed-path based, or defined through `field_pattern`.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
yiguolei
approved these changes
May 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #62992