[fix](be) Remove original predicates after dict rewrite#63199
[fix](be) Remove original predicates after dict rewrite#63199Gabriel39 wants to merge 2 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found one blocking review issue: this BE scan-path crash fix changes predicate execution behavior but has no regression coverage. The implementation itself is small and aligned with the stated goal: after dictionary predicate rewrite succeeds, the original string predicate is removed from the predicate-phase conjunct list so it is not evaluated against ColumnInt32 dictionary ids.
Critical checkpoint conclusions:
- Goal/test: The code appears to address the stated crash path, but there is no automated test proving the lazy-materialization + Parquet string dictionary IN predicate case no longer crashes.
- Scope: The code change is focused and minimal.
- Concurrency/lifecycle: No new concurrency, shared mutable state, or non-obvious lifecycle management is introduced.
- Config/compatibility: No new config, storage format, RPC, or compatibility concern.
- Parallel paths: Existing Parquet dictionary-filter tests exist, but the lazy materialization path that triggered this bug is not covered by this PR.
- Error handling: The new invariant check is consistent with the preceding lookup; no ignored Status was found.
- Data correctness/performance: Removing the original predicate after successful rewrite is logically consistent with the dictionary-id filtering phase and avoids redundant/crashing evaluation. The erase cost is negligible for the small conjunct vector.
- Observability: No new observability seems necessary for this narrow fix.
User focus: No additional user-provided focus points were present.
| // 4. Rewrite conjuncts. | ||
| RETURN_IF_ERROR(_rewrite_dict_conjuncts( | ||
| dict_codes, slot_id, temp_block.get_by_position(dict_pos).column->is_nullable())); | ||
| _remove_slot_filter_conjuncts(slot_id); |
There was a problem hiding this comment.
This fixes a BE crash path, but the PR only records manual testing and does not add an automated regression. Please add coverage for the exact scenario: Parquet dictionary filtering on a string IN predicate with lazy materialization enabled and at least one lazy-read column, so the predicate phase reads dictionary ids and would previously execute the original string predicate against ColumnInt32. There is already related coverage in regression-test/suites/external_table_p0/hive/test_string_dict_filter.groovy and test_parquet_lazy_mat_profile.groovy; extending one of those would prevent this crash from regressing.
|
run buildall |
TPC-H: Total hot run time: 29405 ms |
TPC-DS: Total hot run time: 172268 ms |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Parquet dictionary filtering rewrites string predicates to dictionary id predicates, but the original slot predicates could remain in the filter conjunct list. During lazy materialization the predicate column is decoded as INT32 dictionary ids, so executing the original string IN predicate can assert-cast ColumnInt32 to ColumnString and abort. Remove the original slot predicates after a successful dictionary predicate rewrite so only the rewritten int predicates are executed in the dictionary-id phase.
Release note
Fix BE crash when Parquet dictionary filtering executes string IN predicates during lazy materialization.
Check List (For Author)
Test: Manual test
Behavior changed: No
Does this need documentation: No
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)