[fix](fe) Add null reject compensation for join rewrite#63268
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 29856 ms |
TPC-DS: Total hot run time: 170885 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
92825ab to
3920f78
Compare
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 29411 ms |
TPC-DS: Total hot run time: 170279 ms |
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Reviewed the PR against the Doris code-review checklist. I found one blocking test-coverage issue: the new regression checks that the MV appears as a rewrite candidate, but the result comparison can still execute the original base-table plan, so the end-to-end null-reject compensation path is not actually verified.
Critical checkpoint conclusions:
- Goal/test proof: the code targets OUTER JOIN MV rewrite for INNER JOIN queries needing null-reject compensation; unit tests cover predicate construction, but the regression does not prove runtime correctness of the chosen MV plan.
- Scope/focus: the implementation is localized to MV predicate compensation and related tests.
- Concurrency/lifecycle/config/compatibility: no new concurrency, lifecycle, config, persistence, or FE-BE protocol concerns found.
- Parallel paths: normal INNER/LEFT/FULL join MV paths were considered; ASOF join behavior appears outside this PR scope.
- Test coverage: needs strengthening as noted inline.
- Observability/performance: no additional observability needed; no material hot-path performance issue found.
User focus: no additional user-provided focus points were present.
### What problem does this PR solve? Issue Number: N/A Related PR: apache#62492 Problem Summary: INNER JoinEdge null-reject inference can validate rewriting an INNER JOIN query by an OUTER JOIN materialized view without adding the required non-null compensation predicate. The rewritten plan can keep null-padded rows from the MV side that should be rejected by the original query. Root cause: In AbstractMaterializedViewRule.predicatesCompensate(), the previous check treated INNER JoinEdge null-reject inference as proof that an OUTER JOIN MV rewrite was valid, but the proof was not materialized as a real IS NOT NULL predicate in the rewritten query. Change Summary: | File | Change Description | |------|--------------------| | AbstractMaterializedViewRule.java | Split predicate-based null-reject proof from INNER JoinEdge proof and add query-based IS NOT NULL compensation when only JoinEdge proof covers required MV nullable sides. Fail rewrite if no safe MV output slot can carry the compensation predicate. | | NullRejectInferenceTest.java | Add unit coverage for LEFT/FULL OUTER JOIN MV rewrites that require INNER JoinEdge null-reject compensation on both sides. | | inner_join_null_reject_compensation.groovy | Add regression coverage with unmatched OUTER JOIN MV rows, including the LEFT JOIN MV to INNER JOIN query repro with nullable join keys. | Design rationale: Existing query predicates already flow through normal predicate compensation, so they do not need extra filters. INNER JoinEdge proof is only logical evidence; when it is needed to reject null-generated MV rows, the rewrite must add a real IS NOT NULL predicate on an MV output slot. If no such slot is available, the rewrite is rejected conservatively. ### Release note Fixed an issue where OUTER JOIN materialized view rewrite could return extra null-padded rows for INNER JOIN queries. ### Check List (For Author) - Test - [x] Regression test - [x] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason Unit tests / checks: - Added NullRejectInferenceTest coverage for INNER/FULL join null-reject compensation on both sides. - Ran git diff --check. - Tried ./run-fe-ut.sh --run org.apache.doris.nereids.rules.exploration.mv.NullRejectInferenceTest, but FE core compilation failed before tests because generated cloud proto classes miss Cloud.CreateMetaSyncPointRequest/Response in MetaServiceClient and MetaServiceProxy. Regression test: - Added inner_join_null_reject_compensation.groovy for FULL/LEFT OUTER JOIN MV rewrites with unmatched null-padded rows. - Not run locally; the local FE UT build is currently blocked by the cloud proto compilation issue above. - Behavior changed: - [x] Yes. OUTER JOIN MV rewrite now adds real IS NOT NULL compensation when INNER JoinEdge null-reject inference is required, or rejects the rewrite if no safe MV output slot can carry that predicate. - [ ] No. - Does this need documentation? - [x] No. - [ ] Yes. ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3920f78 to
bb824f3
Compare
|
run buildall |
TPC-H: Total hot run time: 31202 ms |
TPC-DS: Total hot run time: 168753 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Review result: no additional blocking issues found beyond the existing thread, which appears addressed in the current diff by forcing MV usage in the new correctness helper.
Critical checkpoint conclusions:
- Goal/test: The PR targets null-reject compensation when rewriting INNER JOIN queries using OUTER JOIN MVs. The implementation adds real IS NOT NULL compensation or rejects unsafe rewrites, with FE unit coverage and regression coverage for full/left outer MV cases.
- Scope: The change is focused in MV predicate compensation plus targeted tests.
- Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, configuration, persistence, or wire/storage compatibility concerns found.
- Parallel paths: The changed path is the common MV predicate compensation path; related regression expectations were updated for affected join rewrite behavior.
- Predicate correctness: Existing query-predicate proof and INNER JoinEdge proof are separated, and INNER JoinEdge-only proof is materialized as post-MV compensation where a safe output slot exists.
- Testing: Unit and regression coverage were added; existing review feedback about forcing MV execution is not re-raised because the current helper now uses use_mv and mv_rewrite_success before comparing results.
- Observability/performance: No new logging/metrics need found; added work is bounded to rewrite-time slot/predicate set checks.
User focus: No additional user-provided review focus was supplied.
What problem does this PR solve?
Issue Number: N/A
Related PR: #62492
Problem Summary:
INNER JoinEdge null-reject inference can validate rewriting an INNER JOIN query by an OUTER JOIN materialized view without adding the required non-null compensation predicate. The rewritten plan can keep null-padded rows from the MV side that should be rejected by the original query.
Root cause: In AbstractMaterializedViewRule.predicatesCompensate(), the previous check treated INNER JoinEdge null-reject inference as proof that an OUTER JOIN MV rewrite was valid, but the proof was not materialized as a real IS NOT NULL predicate in the rewritten query.
Change Summary:
AbstractMaterializedViewRule.java | Split predicate-based null-reject proof from INNER JoinEdge proof and add query-based IS NOT NULL compensation when only JoinEdge proof covers required MV nullable sides. Fail rewrite if no safe MV output slot can carry the compensation predicate.
NullRejectInferenceTest.java | Add unit coverage for LEFT/FULL OUTER JOIN MV rewrites that require INNER JoinEdge null-reject compensation on both sides.
inner_join_null_reject_compensation.groovy | Add regression coverage with unmatched OUTER JOIN MV rows, including the LEFT JOIN MV to INNER JOIN query repro with nullable join keys.
Design rationale: Existing query predicates already flow through normal predicate compensation, so they do not need extra filters. INNER JoinEdge proof is only logical evidence; when it is needed to reject null-generated MV rows, the rewrite must add a real IS NOT NULL predicate on an MV output slot. If no such slot is available, the rewrite is rejected conservatively.
Release note
Fixed an issue where OUTER JOIN materialized view rewrite could return extra null-padded rows for INNER JOIN queries.
Check List (For Author)
Test
Regression test
Unit Test
Manual test (add detailed scripts or steps below)
No need to test or manual test. Explain why:
This is a refactor/code format and no logic has been changed.
Previous test can cover this change.
No code files have been changed.
Other reason
Unit tests / checks:
Added NullRejectInferenceTest coverage for INNER/FULL join null-reject compensation on both sides.
Ran git diff --check.
Tried ./run-fe-ut.sh --run org.apache.doris.nereids.rules.exploration.mv.NullRejectInferenceTest, but FE core compilation failed before tests because generated cloud proto classes miss Cloud.CreateMetaSyncPointRequest/Response in MetaServiceClient and MetaServiceProxy.
Regression test:
Added inner_join_null_reject_compensation.groovy for FULL/LEFT OUTER JOIN MV rewrites with unmatched null-padded rows.
Not run locally; the local FE UT build is currently blocked by the cloud proto compilation issue above.
Behavior changed:
Yes. OUTER JOIN MV rewrite now adds real IS NOT NULL compensation when INNER JoinEdge null-reject inference is required, or rejects the rewrite if no safe MV output slot can carry that predicate.
No.
Does this need documentation?
No.
Yes.
Check List (For Reviewer who merge this PR)
Confirm the release note
Confirm test cases
Confirm document
Add branch pick label
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com