Skip to content

[fix](nereids) Guard UniqueFunction in SetOperation and NLJ project rewrites#62754

Closed
yujun777 wants to merge 1 commit into
apache:masterfrom
yujun777:fix/unique-fn-pushdown-batch2
Closed

[fix](nereids) Guard UniqueFunction in SetOperation and NLJ project rewrites#62754
yujun777 wants to merge 1 commit into
apache:masterfrom
yujun777:fix/unique-fn-pushdown-batch2

Conversation

@yujun777
Copy link
Copy Markdown
Contributor

@yujun777 yujun777 commented Apr 23, 2026

What problem does this PR solve?

Issue Number: N/A

Related PR: #62742

Problem Summary:

Two additional Nereids rewrite rules were not aware of UniqueFunction (e.g. rand(), uuid(), random()) and could silently change the semantics of SQL that uses such functions. This PR is batch 2 of the UniqueFunction pushdown audit. The previously-reviewed batch 1 is in #62742.

  1. PushDownFilterThroughSetOperation
    Pushing a filter that references a UniqueFunction through UNION DISTINCT / INTERSECT / EXCEPT duplicates each unique-fn call per child, which changes result semantics. UNION ALL remains safe. The rewrite now splits conjuncts: pushable conjuncts (UNION ALL, or not containing a unique function) go through the child branches; non-pushable conjuncts stay above the set op.

  2. ProjectOtherJoinConditionForNestedLoopJoin
    The rule extracts sub-expressions of NLJ other conditions into child projects as aliases. An expression containing a unique function must not be materialized once above the scan and then referenced in the join, because that separates its evaluation from the join pair. Expressions that contain a unique function are now left inline in the other condition.

Both fixes follow the same pattern used in batch 1: an early no-op branch when expression.containsUniqueFunction() is true.

Note: an earlier revision of this PR also guarded JoinUtils.isHashJoinCondition for t1.a = t2.b + rand() patterns. That guard was removed after discussion: hash join already evaluates each side's key exactly once per row (build-side once per right row; probe-side once per left row), which matches per-row rand() semantics. Forcing NLJ would evaluate rand() |t1|×|t2| times and regress performance for no semantic benefit.

Release note

None

Check List (For Author)

  • Test: Regression test
    • Added regression-test/suites/nereids_rules_p0/unique_function/push_down_filter_through_set_operation_with_unique_function.groovy
    • Added regression-test/suites/nereids_rules_p0/unique_function/project_other_join_condition_for_nlj_with_unique_function.groovy
    • All existing nereids_rules_p0/unique_function suites still pass
  • Behavior changed: No (corrects previously-incorrect semantics under rand()/uuid()/random())
  • Does this need documentation: No

@yujun777
Copy link
Copy Markdown
Contributor Author

run buildall

@yujun777
Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777 yujun777 marked this pull request as draft April 23, 2026 08:53
@yujun777 yujun777 marked this pull request as ready for review April 23, 2026 08:54
@yujun777 yujun777 force-pushed the fix/unique-fn-pushdown-batch2 branch from 3f29634 to b0eefc8 Compare April 23, 2026 09:03
@yujun777 yujun777 changed the title [fix](nereids) Guard UniqueFunction in SetOperation / hashJoin / NLJ project rewrites [fix](nereids) Guard UniqueFunction in SetOperation and NLJ project rewrites Apr 23, 2026
@yujun777
Copy link
Copy Markdown
Contributor Author

run buildall

@yujun777
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 1 blocking issue.

Critical Checkpoints

  • Goal: Partially achieved. The set-operation pushdown and hash-join-condition fixes look correct, but the NLJ path is still semantically unsafe because repeated UniqueFunction expressions are re-materialized later by AddProjectForUniqueFunction, and the new tests do not cover that late rewrite.
  • Scope: Small and focused.
  • Concurrency: Not applicable; this is FE planner rewrite logic with no new shared-state or lock interactions.
  • Lifecycle / static initialization: Not applicable.
  • Config: No new configuration is introduced.
  • Compatibility: No storage/protocol compatibility change is involved.
  • Parallel paths: Not fully covered. AddProjectForUniqueFunction.JoinRewrite is another join rewrite path that still aliases repeated UniqueFunction expressions inside join conjuncts.
  • Special-condition handling: The new containsUniqueFunction() guards are reasonable where added, but they are incomplete for the full join rewrite pipeline.
  • Tests: The new shape-plan tests cover the single-occurrence cases added here, but they miss the repeated-UniqueFunction join case that still goes through the late AddProjectForUniqueFunction rewrite.
  • Test result updates: The new .out files are internally consistent, but the updated add_project_for_unique_function.out also exposes the remaining semantic issue below.
  • Observability: No additional observability is needed for this planner-only change.
  • Transaction / persistence / data writes / FE-BE variable passing: Not applicable.
  • Performance: No new performance concern observed beyond the semantic issue below.

Finding

  • ProjectOtherJoinConditionForNestedLoopJoin now skips aliasing unique-function expressions, but the same materialization still happens later in AddProjectForUniqueFunction.JoinRewrite for repeated unique functions created by rewrites such as BETWEEN. The updated regression-test/data/nereids_rules_p0/unique_function/add_project_for_unique_function.out still shows random(1, 100) projected under NestedLoopJoin, which changes evaluation from per join pair to per left row. That means the PR does not yet fully preserve UniqueFunction semantics for NLJ conditions.

// it into a child Project, rand()'s evaluation granularity changes from "per join
// pair" to "per row of that child", which silently changes results. Keep such
// expressions inline in otherJoinConjuncts.
if (expression.containsUniqueFunction()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard stops PROJECT_OTHER_JOIN_CONDITION from pushing the expression into a child project, but the same semantic change still happens later in AddProjectForUniqueFunction.JoinRewrite. The updated regression-test/data/nereids_rules_p0/unique_function/add_project_for_unique_function.out still shows t1.id + t2.id + random(1, 100) between 10 and 20 becoming a PhysicalProject[random(...) AS ...] under the NLJ, so random() is evaluated once per left row instead of once per join pair. Please guard that later join rewrite too (or otherwise preserve per-pair evaluation), and add a regression that exercises the BETWEEN / duplicated-UniqueFunction path.`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in the main PR comment: this is a pre-existing trade-off in AddProjectForUniqueFunction.JoinRewrite caused by BETWEEN expansion into two independent rand() calls, and is intentionally out of scope for this PR. Please see the main comment for the full reasoning.

@yujun777 yujun777 force-pushed the fix/unique-fn-pushdown-batch2 branch from b0eefc8 to 768c688 Compare April 23, 2026 09:23
@yujun777
Copy link
Copy Markdown
Contributor Author

run buildall

@yujun777
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue.

  1. fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ProjectOtherJoinConditionForNestedLoopJoin.java: the new early return is incomplete. Repeated UniqueFunctions inside mixed-side NLJ conjuncts are still materialized into the left child later by AddProjectForUniqueFunction.JoinRewrite (see the updated add_project_for_unique_function.out), so queries like ... ON t1.id + t2.id + rand() BETWEEN 10 AND 20 still reuse one random value for all right rows of a left row instead of evaluating per join pair.

Checkpoint conclusions:

  • Goal of the PR: partially achieved. The set-operation change looks correct, but the NLJ path is not fully fixed because a later rewrite reintroduces child-side materialization of repeated unique functions.
  • Minimality/focus: yes, the change is small and localized.
  • Concurrency/lifecycle/config/compatibility/persistence: not applicable here; I did not see issues in those areas.
  • Parallel code paths: not fully covered. AddProjectForUniqueFunction.JoinRewrite is another reachable rewrite path for join conjuncts and still needs the same UniqueFunction awareness.
  • Conditional checks: the new containsUniqueFunction() split in PushDownFilterThroughSetOperation is reasonable; the NLJ guard is too local.
  • Test coverage: regression coverage was added, but it misses the repeated-UniqueFunction join case (BETWEEN/shared expression) that still rewrites incorrectly. The updated add_project_for_unique_function.out continues to show the problematic left-child projection.
  • Test result changes: the new .out files for set operations look consistent with the intended behavior; the modified add_project_for_unique_function.out highlights the remaining NLJ problem above.
  • Performance: no new hot-path issue in the set-op change; for NLJ, the current approach also stops projecting safe deterministic opposite-side subexpressions once a unique function appears anywhere in the subtree, which may leave avoidable per-pair recomputation.
  • User focus points: no additional user-provided focus points were supplied.

Because of the remaining NLJ correctness gap above, I am requesting changes.

// it into a child Project, rand()'s evaluation granularity changes from "per join
// pair" to "per row of that child", which silently changes results. Keep such
// expressions inline in otherJoinConjuncts.
if (expression.containsUniqueFunction()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard is still too local to preserve join-pair semantics. ProjectOtherJoinConditionForNestedLoopJoin now stops aliasing the whole mixed subtree here, but later AddProjectForUniqueFunction.JoinRewrite still scans otherJoinConjuncts and hoists repeated UniqueFunctions into the left child project. The updated add_project_for_unique_function.out already shows that for t1.id + t2.id + random() between 10 and 20: $_random_7_$ is materialized on the left side, so the same random draw is reused for every right row of a given left row. That is still different from evaluating the ON predicate once per join pair.

Can we also block the later join-side materialization for unique functions that live inside mixed-side otherJoinConjuncts (or recurse here and only skip aliasing the exact unique-function subtree)? Otherwise this PR still leaves wrong-results cases in NLJ.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in the main PR comment: this is a pre-existing trade-off in AddProjectForUniqueFunction.JoinRewrite caused by BETWEEN expansion into two independent rand() calls, and is intentionally out of scope for this PR. Please see the main comment for the full reasoning.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because one blocking correctness issue remains unresolved (already raised inline, so I am not duplicating the same inline comment here):

  • fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ProjectOtherJoinConditionForNestedLoopJoin.java:121 now keeps UniqueFunction expressions inline, but the later AddProjectForUniqueFunction.JoinRewrite path (fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AddProjectForUniqueFunction.java:199-239) still extracts duplicated UniqueFunction expressions into the left child project. The updated regression-test/data/nereids_rules_p0/unique_function/add_project_for_unique_function.out still shows random(1, 100) materialized below the NLJ for join_1, so evaluation granularity changes from per join pair to per left row. Please guard that downstream join rewrite too and add regression coverage for the duplicated-UniqueFunction / BETWEEN path.

Critical checkpoints:

  • Goal / correctness: Partially achieved. The set-operation guard looks correct, but the NLJ path is still semantically incorrect end-to-end because a later rewrite reintroduces the bug.
  • Minimality / focus: Yes. The patch is small and focused on the targeted rewrites plus regression coverage.
  • Concurrency: Not applicable. These are FE optimizer rewrites with no new concurrent state or locking.
  • Lifecycle / static init: Not applicable.
  • Config changes: None.
  • Compatibility / storage / protocol: None.
  • Parallel code paths: Not fully covered. AddProjectForUniqueFunction.JoinRewrite remains inconsistent with the new NLJ guard.
  • Conditional checks: The new containsUniqueFunction() checks are appropriate, but the downstream join rewrite needs the same protection.
  • Tests: Added shape-plan regression coverage for NLJ and set operations. However, the blocking NLJ case is still visible in the existing add_project_for_unique_function.out output, and the new tests are shape-only rather than result-validating.
  • Observability: No additional observability seems necessary for this planner-only change.
  • Transactions / persistence / FE-BE variable passing: Not applicable.
  • Performance: The conjunct split for set operations is straightforward and low-cost.
  • Other issues: No additional distinct issues found beyond the blocking problem above.

User focus:

  • No additional user-provided review focus.

… SetOp predicate inference rewrites

### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#62742

Problem Summary:

Three Nereids rewrite rules were not aware of `UniqueFunction`
(e.g. `rand()`, `uuid()`, `random()`) and could silently change the
semantics of SQL that uses such functions. This PR is batch 2 of the
`UniqueFunction` pushdown audit. The previously-reviewed batch 1 is in
apache#62742.

1. `PushDownFilterThroughSetOperation`
   Pushing a filter that references a `UniqueFunction` through
   `UNION DISTINCT` / `INTERSECT` / `EXCEPT` duplicates each unique-fn
   call per child, which changes result semantics (e.g. `rand() > 0.5`
   would be evaluated independently for each branch instead of once
   above the set op). `UNION ALL` remains safe, because the set op is a
   pure concatenation and every row evaluates the filter exactly once
   below or above. The rewrite now splits conjuncts: pushable conjuncts
   (`UNION ALL`, or not containing a unique function) go through the
   child branches; non-pushable conjuncts stay above the set op.

2. `ProjectOtherJoinConditionForNestedLoopJoin`
   The rule extracts sub-expressions of NLJ other conditions into
   child projects as aliases. An expression containing a unique
   function must not be materialized once above the scan and then
   referenced in the join, because that separates its evaluation from
   the join pair. Expressions that contain a unique function are now
   left inline in the other condition.

3. `InferPredicates` (EXCEPT / INTERSECT branches)
   `visitLogicalExcept` / `visitLogicalIntersect` substitute slots of
   the first child through the set-op output to sibling children so
   that predicates on the first child can be inferred onto sibling
   children. When the pulled-up predicate contains a unique function
   (e.g. `t1.id + rand() > 5`), the substitution rewrites it into a
   predicate on the sibling (`t2.id + rand() > 5`) and re-adds it,
   which re-evaluates `rand()` on a different set of rows. The JOIN
   path does not do this substitution, so its pre-existing slot
   subset check is sufficient. We now skip any inferred expression
   that contains a unique function inside the universal inference
   loop, which protects SetOp as well as any future consumer that
   does similar slot substitution.

All three fixes follow the same pattern used in batch 1: an early
no-op branch when `expression.containsUniqueFunction()` is true.

### Release note

None

### Check List (For Author)

- Test: Regression test
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yujun777 yujun777 force-pushed the fix/unique-fn-pushdown-batch2 branch from 768c688 to 64712e3 Compare April 23, 2026 10:03
@yujun777
Copy link
Copy Markdown
Contributor Author

run buildall

@yujun777
Copy link
Copy Markdown
Contributor Author

/review

@yujun777
Copy link
Copy Markdown
Contributor Author

Thanks for the review. Addressing the concern about AddProjectForUniqueFunction.JoinRewrite still materializing repeated UniqueFunctions into the left child project.

This is a pre-existing trade-off in that rule and is intentionally out of scope for this PR. Brief explanation:

  1. Doris expands expr BETWEEN a AND b into expr >= a AND expr <= b, which duplicates any UniqueFunction inside expr. Once duplicated, there is no in-planner way to recover the original single-call semantics "draw one random value and check it against both bounds", unless we keep the BETWEEN unsplit or re-deduplicate above the join — both of which are larger structural changes.

  2. Once the duplication exists inside an NLJ condition, the choice is between:

    • Leaving two independent UniqueFunction calls in the NLJ other-condition (per join pair, but two independent samplings, so P(a<=r<=b) becomes two independent events instead of one joint event);
    • Materializing one alias into the left child project (per-left-row evaluation; strictly fewer calls and shared by all right rows for the same left row).

    Neither preserves the user's intended BETWEEN semantics for a random expression, because those semantics were already destroyed by the BETWEEN-split step. The left-child materialization is chosen as the less-bad option: it at least keeps the two bound checks consistent with a single value, which is closer to what the user writes. Two independent rand() calls against >= and <= is strictly worse for this common pattern.

  3. This PR targets the set of Nereids rewrite rules that introduce new duplication or move a UniqueFunction through a rule in a way that changes its evaluation cardinality. AddProjectForUniqueFunction.JoinRewrite does not introduce new duplication; it consolidates already-duplicated calls. It is therefore a separate concern and will be revisited, if at all, together with a deeper fix to the BETWEEN expansion or a join-condition-preserving dedup path.

  4. Existing regression coverage in add_project_for_unique_function.groovy (qt_join_1) already pins this trade-off shape, so any future change to it will be visible in a single reviewable diff.

For the three rules in this PR (PushDownFilterThroughSetOperation, ProjectOtherJoinConditionForNestedLoopJoin, and now InferPredicates for EXCEPT/INTERSECT), the fix scope is correct and complete.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 97.73% (43/44) 🎉
Increment coverage report
Complete coverage report

@yujun777
Copy link
Copy Markdown
Contributor Author

yujun777 commented Jun 1, 2026

PR #62742 merge this PR's changes, no need this PR again

@yujun777 yujun777 closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants