[fix](nereids) Guard UniqueFunction in SetOperation and NLJ project rewrites#62754
[fix](nereids) Guard UniqueFunction in SetOperation and NLJ project rewrites#62754yujun777 wants to merge 1 commit into
Conversation
|
run buildall |
|
/review |
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
3f29634 to
b0eefc8
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found 1 blocking issue.
Critical Checkpoints
- Goal: Partially achieved. The set-operation pushdown and hash-join-condition fixes look correct, but the NLJ path is still semantically unsafe because repeated
UniqueFunctionexpressions are re-materialized later byAddProjectForUniqueFunction, and the new tests do not cover that late rewrite. - Scope: Small and focused.
- Concurrency: Not applicable; this is FE planner rewrite logic with no new shared-state or lock interactions.
- Lifecycle / static initialization: Not applicable.
- Config: No new configuration is introduced.
- Compatibility: No storage/protocol compatibility change is involved.
- Parallel paths: Not fully covered.
AddProjectForUniqueFunction.JoinRewriteis another join rewrite path that still aliases repeatedUniqueFunctionexpressions inside join conjuncts. - Special-condition handling: The new
containsUniqueFunction()guards are reasonable where added, but they are incomplete for the full join rewrite pipeline. - Tests: The new shape-plan tests cover the single-occurrence cases added here, but they miss the repeated-
UniqueFunctionjoin case that still goes through the lateAddProjectForUniqueFunctionrewrite. - Test result updates: The new
.outfiles are internally consistent, but the updatedadd_project_for_unique_function.outalso exposes the remaining semantic issue below. - Observability: No additional observability is needed for this planner-only change.
- Transaction / persistence / data writes / FE-BE variable passing: Not applicable.
- Performance: No new performance concern observed beyond the semantic issue below.
Finding
ProjectOtherJoinConditionForNestedLoopJoinnow skips aliasing unique-function expressions, but the same materialization still happens later inAddProjectForUniqueFunction.JoinRewritefor repeated unique functions created by rewrites such asBETWEEN. The updatedregression-test/data/nereids_rules_p0/unique_function/add_project_for_unique_function.outstill showsrandom(1, 100)projected underNestedLoopJoin, which changes evaluation from per join pair to per left row. That means the PR does not yet fully preserveUniqueFunctionsemantics for NLJ conditions.
| // it into a child Project, rand()'s evaluation granularity changes from "per join | ||
| // pair" to "per row of that child", which silently changes results. Keep such | ||
| // expressions inline in otherJoinConjuncts. | ||
| if (expression.containsUniqueFunction()) { |
There was a problem hiding this comment.
This guard stops PROJECT_OTHER_JOIN_CONDITION from pushing the expression into a child project, but the same semantic change still happens later in AddProjectForUniqueFunction.JoinRewrite. The updated regression-test/data/nereids_rules_p0/unique_function/add_project_for_unique_function.out still shows t1.id + t2.id + random(1, 100) between 10 and 20 becoming a PhysicalProject[random(...) AS ...] under the NLJ, so random() is evaluated once per left row instead of once per join pair. Please guard that later join rewrite too (or otherwise preserve per-pair evaluation), and add a regression that exercises the BETWEEN / duplicated-UniqueFunction path.`
There was a problem hiding this comment.
Replied in the main PR comment: this is a pre-existing trade-off in AddProjectForUniqueFunction.JoinRewrite caused by BETWEEN expansion into two independent rand() calls, and is intentionally out of scope for this PR. Please see the main comment for the full reasoning.
b0eefc8 to
768c688
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one blocking issue.
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ProjectOtherJoinConditionForNestedLoopJoin.java: the new early return is incomplete. RepeatedUniqueFunctions inside mixed-side NLJ conjuncts are still materialized into the left child later byAddProjectForUniqueFunction.JoinRewrite(see the updatedadd_project_for_unique_function.out), so queries like... ON t1.id + t2.id + rand() BETWEEN 10 AND 20still reuse one random value for all right rows of a left row instead of evaluating per join pair.
Checkpoint conclusions:
- Goal of the PR: partially achieved. The set-operation change looks correct, but the NLJ path is not fully fixed because a later rewrite reintroduces child-side materialization of repeated unique functions.
- Minimality/focus: yes, the change is small and localized.
- Concurrency/lifecycle/config/compatibility/persistence: not applicable here; I did not see issues in those areas.
- Parallel code paths: not fully covered.
AddProjectForUniqueFunction.JoinRewriteis another reachable rewrite path for join conjuncts and still needs the sameUniqueFunctionawareness. - Conditional checks: the new
containsUniqueFunction()split inPushDownFilterThroughSetOperationis reasonable; the NLJ guard is too local. - Test coverage: regression coverage was added, but it misses the repeated-
UniqueFunctionjoin case (BETWEEN/shared expression) that still rewrites incorrectly. The updatedadd_project_for_unique_function.outcontinues to show the problematic left-child projection. - Test result changes: the new
.outfiles for set operations look consistent with the intended behavior; the modifiedadd_project_for_unique_function.outhighlights the remaining NLJ problem above. - Performance: no new hot-path issue in the set-op change; for NLJ, the current approach also stops projecting safe deterministic opposite-side subexpressions once a unique function appears anywhere in the subtree, which may leave avoidable per-pair recomputation.
- User focus points: no additional user-provided focus points were supplied.
Because of the remaining NLJ correctness gap above, I am requesting changes.
| // it into a child Project, rand()'s evaluation granularity changes from "per join | ||
| // pair" to "per row of that child", which silently changes results. Keep such | ||
| // expressions inline in otherJoinConjuncts. | ||
| if (expression.containsUniqueFunction()) { |
There was a problem hiding this comment.
This guard is still too local to preserve join-pair semantics. ProjectOtherJoinConditionForNestedLoopJoin now stops aliasing the whole mixed subtree here, but later AddProjectForUniqueFunction.JoinRewrite still scans otherJoinConjuncts and hoists repeated UniqueFunctions into the left child project. The updated add_project_for_unique_function.out already shows that for t1.id + t2.id + random() between 10 and 20: $_random_7_$ is materialized on the left side, so the same random draw is reused for every right row of a given left row. That is still different from evaluating the ON predicate once per join pair.
Can we also block the later join-side materialization for unique functions that live inside mixed-side otherJoinConjuncts (or recurse here and only skip aliasing the exact unique-function subtree)? Otherwise this PR still leaves wrong-results cases in NLJ.
There was a problem hiding this comment.
Replied in the main PR comment: this is a pre-existing trade-off in AddProjectForUniqueFunction.JoinRewrite caused by BETWEEN expansion into two independent rand() calls, and is intentionally out of scope for this PR. Please see the main comment for the full reasoning.
There was a problem hiding this comment.
Requesting changes because one blocking correctness issue remains unresolved (already raised inline, so I am not duplicating the same inline comment here):
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ProjectOtherJoinConditionForNestedLoopJoin.java:121now keepsUniqueFunctionexpressions inline, but the laterAddProjectForUniqueFunction.JoinRewritepath (fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AddProjectForUniqueFunction.java:199-239) still extracts duplicatedUniqueFunctionexpressions into the left child project. The updatedregression-test/data/nereids_rules_p0/unique_function/add_project_for_unique_function.outstill showsrandom(1, 100)materialized below the NLJ forjoin_1, so evaluation granularity changes from per join pair to per left row. Please guard that downstream join rewrite too and add regression coverage for the duplicated-UniqueFunction/BETWEENpath.
Critical checkpoints:
- Goal / correctness: Partially achieved. The set-operation guard looks correct, but the NLJ path is still semantically incorrect end-to-end because a later rewrite reintroduces the bug.
- Minimality / focus: Yes. The patch is small and focused on the targeted rewrites plus regression coverage.
- Concurrency: Not applicable. These are FE optimizer rewrites with no new concurrent state or locking.
- Lifecycle / static init: Not applicable.
- Config changes: None.
- Compatibility / storage / protocol: None.
- Parallel code paths: Not fully covered.
AddProjectForUniqueFunction.JoinRewriteremains inconsistent with the new NLJ guard. - Conditional checks: The new
containsUniqueFunction()checks are appropriate, but the downstream join rewrite needs the same protection. - Tests: Added shape-plan regression coverage for NLJ and set operations. However, the blocking NLJ case is still visible in the existing
add_project_for_unique_function.outoutput, and the new tests are shape-only rather than result-validating. - Observability: No additional observability seems necessary for this planner-only change.
- Transactions / persistence / FE-BE variable passing: Not applicable.
- Performance: The conjunct split for set operations is straightforward and low-cost.
- Other issues: No additional distinct issues found beyond the blocking problem above.
User focus:
- No additional user-provided review focus.
… SetOp predicate inference rewrites ### What problem does this PR solve? Issue Number: N/A Related PR: apache#62742 Problem Summary: Three Nereids rewrite rules were not aware of `UniqueFunction` (e.g. `rand()`, `uuid()`, `random()`) and could silently change the semantics of SQL that uses such functions. This PR is batch 2 of the `UniqueFunction` pushdown audit. The previously-reviewed batch 1 is in apache#62742. 1. `PushDownFilterThroughSetOperation` Pushing a filter that references a `UniqueFunction` through `UNION DISTINCT` / `INTERSECT` / `EXCEPT` duplicates each unique-fn call per child, which changes result semantics (e.g. `rand() > 0.5` would be evaluated independently for each branch instead of once above the set op). `UNION ALL` remains safe, because the set op is a pure concatenation and every row evaluates the filter exactly once below or above. The rewrite now splits conjuncts: pushable conjuncts (`UNION ALL`, or not containing a unique function) go through the child branches; non-pushable conjuncts stay above the set op. 2. `ProjectOtherJoinConditionForNestedLoopJoin` The rule extracts sub-expressions of NLJ other conditions into child projects as aliases. An expression containing a unique function must not be materialized once above the scan and then referenced in the join, because that separates its evaluation from the join pair. Expressions that contain a unique function are now left inline in the other condition. 3. `InferPredicates` (EXCEPT / INTERSECT branches) `visitLogicalExcept` / `visitLogicalIntersect` substitute slots of the first child through the set-op output to sibling children so that predicates on the first child can be inferred onto sibling children. When the pulled-up predicate contains a unique function (e.g. `t1.id + rand() > 5`), the substitution rewrites it into a predicate on the sibling (`t2.id + rand() > 5`) and re-adds it, which re-evaluates `rand()` on a different set of rows. The JOIN path does not do this substitution, so its pre-existing slot subset check is sufficient. We now skip any inferred expression that contains a unique function inside the universal inference loop, which protects SetOp as well as any future consumer that does similar slot substitution. All three fixes follow the same pattern used in batch 1: an early no-op branch when `expression.containsUniqueFunction()` is true. ### Release note None ### Check List (For Author) - Test: Regression test - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
768c688 to
64712e3
Compare
|
run buildall |
|
/review |
|
Thanks for the review. Addressing the concern about This is a pre-existing trade-off in that rule and is intentionally out of scope for this PR. Brief explanation:
For the three rules in this PR ( |
FE Regression Coverage ReportIncrement line coverage |
|
PR #62742 merge this PR's changes, no need this PR again |
What problem does this PR solve?
Issue Number: N/A
Related PR: #62742
Problem Summary:
Two additional Nereids rewrite rules were not aware of
UniqueFunction(e.g.rand(),uuid(),random()) and could silently change the semantics of SQL that uses such functions. This PR is batch 2 of theUniqueFunctionpushdown audit. The previously-reviewed batch 1 is in #62742.PushDownFilterThroughSetOperationPushing a filter that references a
UniqueFunctionthroughUNION DISTINCT/INTERSECT/EXCEPTduplicates each unique-fn call per child, which changes result semantics.UNION ALLremains safe. The rewrite now splits conjuncts: pushable conjuncts (UNION ALL, or not containing a unique function) go through the child branches; non-pushable conjuncts stay above the set op.ProjectOtherJoinConditionForNestedLoopJoinThe rule extracts sub-expressions of NLJ other conditions into child projects as aliases. An expression containing a unique function must not be materialized once above the scan and then referenced in the join, because that separates its evaluation from the join pair. Expressions that contain a unique function are now left inline in the other condition.
Both fixes follow the same pattern used in batch 1: an early no-op branch when
expression.containsUniqueFunction()is true.Note: an earlier revision of this PR also guarded
JoinUtils.isHashJoinConditionfort1.a = t2.b + rand()patterns. That guard was removed after discussion: hash join already evaluates each side's key exactly once per row (build-side once per right row; probe-side once per left row), which matches per-rowrand()semantics. Forcing NLJ would evaluaterand()|t1|×|t2| times and regress performance for no semantic benefit.Release note
None
Check List (For Author)
regression-test/suites/nereids_rules_p0/unique_function/push_down_filter_through_set_operation_with_unique_function.groovyregression-test/suites/nereids_rules_p0/unique_function/project_other_join_condition_for_nlj_with_unique_function.groovynereids_rules_p0/unique_functionsuites still passrand()/uuid()/random())