[fix](nereids) Block push-down of filters containing unique functions through Generate and CTE consumer by yujun777 · Pull Request #62705 · apache/doris

yujun777 · 2026-04-22T08:38:02Z

What problem does this PR solve?

Issue Number: close #25201, close #25202

Problem Summary:
Two Nereids rewrite rules moved filter conjuncts that contain non-idempotent UniqueFunction calls (rand / uuid / random_bytes / uuid_numeric) across operators that change how many times the unique function is evaluated, producing wrong results.

PushDownFilterThroughGenerate pushed a conjunct like t1.id + rand(1,100) > 5 below LogicalGenerate. Before the push, rand is evaluated per generated row; after, it is evaluated per base row and then the result is duplicated for every row produced by generate, so groups of N generated rows share a single rand value instead of N independent ones.
CollectFilterAboveConsumer registered filter conjuncts above a CTE consumer into cascadesContext.putConsumerIdToFilter(...), after which RewriteCteChildren.tryToConstructFilter would OR them up and push them into the CTE producer. For a conjunct like rand() > 0.1, that causes the random filter to run on both the producer scan and each consumer filter, and different consumers would see inconsistent rows.

Fix: in both rules, skip conjuncts whose containsUniqueFunction() is true so they stay above the operator and are evaluated once per output row.

Adjacent rules (PushDownFilterThroughRepeat/Window/PartitionTopN/SetOperation, PullUpPredicates and its consumers) have the same class of bug but are out of scope for this PR and will be addressed separately.

Release note

Fix wrong results when filters containing non-idempotent functions (rand / uuid / random_bytes / uuid_numeric) are combined with LATERAL VIEW explode_* / LogicalGenerate or CTE consumers.

Check List (For Author)

Test:
- Regression test (new suites under regression-test/suites/nereids_rules_p0/unique_function/)
- Unit Test (new PushDownFilterThroughGenerateTest and CollectFilterAboveConsumerTest under fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/)
Behavior changed: Yes. Filter conjuncts containing UniqueFunction are no longer pushed below LogicalGenerate and are no longer collected into a CTE producer. Deterministic conjuncts still behave as before.
Does this need documentation: No.

… through Generate and CTE consumer ### What problem does this PR solve? Issue Number: close apache#25201, close apache#25202 Problem Summary: Two Nereids rewrite rules moved filter conjuncts that contain non-idempotent UniqueFunction calls (rand / uuid / random_bytes / uuid_numeric) across operators that change how many times the unique function is evaluated, producing wrong results. 1. `PushDownFilterThroughGenerate` pushed a conjunct like `t1.id + rand(1,100) > 5` below `LogicalGenerate`. Before the push, rand is evaluated per generated row; after, it is evaluated per base row and then the result is duplicated for every row produced by generate, so groups of N generated rows share a single rand value instead of N independent ones. 2. `CollectFilterAboveConsumer` registered filter conjuncts above a CTE consumer into `cascadesContext.putConsumerIdToFilter(...)`, after which `RewriteCteChildren.tryToConstructFilter` would OR them up and push them into the CTE producer. For a conjunct like `rand() > 0.1`, that causes the random filter to run on both the producer scan and each consumer filter, and different consumers would see inconsistent rows. Fix: in both rules, skip conjuncts whose `containsUniqueFunction()` is true so they stay above the operator and are evaluated once per output row. Adjacent rules (`PushDownFilterThroughRepeat/Window/PartitionTopN/SetOperation`, `PullUpPredicates` and its consumers) have the same class of bug but are out of scope for this PR and will be addressed separately. ### Release note Fix wrong results when filters containing non-idempotent functions (rand / uuid / random_bytes / uuid_numeric) are combined with `LATERAL VIEW explode_*` / `LogicalGenerate` or CTE consumers. ### Check List (For Author) - Test: - Regression test (new suites under `regression-test/suites/nereids_rules_p0/unique_function/`) - Unit Test (new `PushDownFilterThroughGenerateTest` and `CollectFilterAboveConsumerTest` under `fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/`) - Behavior changed: Yes. Filter conjuncts containing UniqueFunction are no longer pushed below `LogicalGenerate` and are no longer collected into a CTE producer. Deterministic conjuncts still behave as before. - Does this need documentation: No. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Thearas · 2026-04-22T08:38:08Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

yujun777 · 2026-04-22T08:38:09Z

run buildall

yujun777 · 2026-04-22T08:38:11Z

/review

hello-stephen · 2026-04-22T10:15:01Z

FE UT Coverage Report

Increment line coverage 75.00% (3/4) 🎉
Increment coverage report
Complete coverage report

github-actions · 2026-04-22T10:38:50Z

OpenCode automated review failed and did not complete.

Error: Review step was skipped (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24768762135

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

hello-stephen · 2026-04-22T11:11:28Z

FE Regression Coverage Report

Increment line coverage 100.00% (4/4) 🎉
Increment coverage report
Complete coverage report

starocean999 · 2026-04-23T02:01:02Z

/review

github-actions · 2026-04-23T02:01:32Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-04-23T02:01:35Z

PR approved by anyone and no changes requested.

github-actions

{
"overall": "comment",
"verdict": "no_blocking_findings",
"findings": [],
"critical_checkpoints": {
"goal": "The PR goal is to stop moving filters that contain UniqueFunction calls through LogicalGenerate and through CTE-consumer collection. The changed rules accomplish that direct goal, and the added FE unit tests plus regression suites cover those direct paths.",
"minimality": "The modification is small, clear, and focused: each affected rule adds a narrow containsUniqueFunction guard without changing unrelated behavior.",
"concurrency": "No concurrency, locking, or lifecycle-sensitive state is involved in the touched code paths.",
"lifecycle": "No special lifecycle or initialization-order concerns were introduced.",
"config": "No configuration was added or changed.",
"compatibility": "No FE-BE protocol, storage-format, or rolling-upgrade compatibility surface was changed.",
"parallel_paths": "There are still adjacent optimizer paths of the same bug class mentioned in the PR description, but I did not find a blocking defect in the specific paths modified by this PR.",
"special_conditions": "The new containsUniqueFunction checks are appropriate here and consistent with existing Nereids safeguards in other pushdown/rewrite rules.",
"tests": "Direct rule coverage is present in FE unit tests and regression shape-plan tests. Residual risk: the new regression coverage is structural rather than runtime-result based.",
"test_results": "The new regression outputs are consistent with the added suites and appear auto-generated.",
"observability": "No additional observability is needed for this optimizer rewrite change.",
"transactions_persistence": "Not applicable.",
"data_writes": "Not applicable.",
"fe_be_variables": "Not applicable.",
"performance": "No obvious performance concern; the extra containsUniqueFunction checks are trivial.",
"other": "No additional blocking issue found in the touched code."
},
"focus_points": {
"user_focus": "No additional user-provided focus points.",
"response": "No extra issue was found beyond the general review."
},
"residual_risks": [
"The new regression coverage mainly checks plan shape, so it does not independently assert runtime row-set correctness for the wrong-result scenarios."
]
}

… through Generate and CTE consumer (#62705) Issue Number: close #25201, close #25202 Problem Summary: Two Nereids rewrite rules moved filter conjuncts that contain non-idempotent `UniqueFunction` calls (`rand` / `uuid` / `random_bytes` / `uuid_numeric`) across operators that change how many times the unique function is evaluated, producing wrong results. 1. `PushDownFilterThroughGenerate` pushed a conjunct like `t1.id + rand(1,100) > 5` below `LogicalGenerate`. Before the push, `rand` is evaluated per generated row; after, it is evaluated per base row and then the result is duplicated for every row produced by generate, so groups of N generated rows share a single rand value instead of N independent ones. 2. `CollectFilterAboveConsumer` registered filter conjuncts above a CTE consumer into `cascadesContext.putConsumerIdToFilter(...)`, after which `RewriteCteChildren.tryToConstructFilter` would OR them up and push them into the CTE producer. For a conjunct like `rand() > 0.1`, that causes the random filter to run on both the producer scan and each consumer filter, and different consumers would see inconsistent rows. Fix: in both rules, skip conjuncts whose `containsUniqueFunction()` is true so they stay above the operator and are evaluated once per output row. Adjacent rules (`PushDownFilterThroughRepeat/Window/PartitionTopN/SetOperation`, `PullUpPredicates` and its consumers) have the same class of bug but are out of scope for this PR and will be addressed separately.

…in pushdown ### What problem does this PR solve? Issue Number: N/A Related PR: apache#62705 Problem Summary: Following up on apache#62705, several Nereids rewrite rules still moved predicates / conjuncts containing non-idempotent (unique) functions such as rand() / uuid() / random_bytes() across operator boundaries in ways that changed query semantics. The common root cause is that a predicate like `rand() > 0.5` has an empty input-slot set, so the `containsAll(emptySet)` / `allMatch(emptyStream)` guards used by these rules silently returned true and allowed the push down. This PR adds `containsUniqueFunction()` guards to the reachable cases: 1. PushDownFilterThroughWindow: skip conjuncts with unique functions. Pushing a unique predicate below a window operator re-samples the base rows and changes every window-function value. 2. PushDownFilterThroughPartitionTopN: same as Window — skip unique conjuncts in the split loop. 3. PushDownJoinOtherCondition: keep conjuncts with unique functions in the join's otherJoinConjuncts instead of pushing them onto either input. Pushing `rand() > 0.5` from an INNER JOIN predicate below the join means each output row is filtered by a different random draw than the join semantic requires, and LEFT/RIGHT JOIN pushes flip filtered vs. null-padded rows entirely. Other candidates that were initially explored (Repeat / Sort elimination / TopN-through-Union) were empirically verified to be unreachable due to upstream normalization (PushDownFilterThroughAggregation already blocks unique conjuncts before they reach Repeat; NormalizeSort wraps rand() in a Project; TopN order keys are already materialized Slots), so no dead guards were added for those. ### Release note Fix wrong results when predicates containing rand() / uuid() / random_bytes() / uuid_numeric() were pushed below Window / PartitionTopN, or moved out of a Join's other-conjuncts list onto one of the join inputs. ### Check List (For Author) - Test: - Unit Test (FE): PushDownFilterThroughWindowTest (+1 unique-fn test), PushDownJoinOtherConditionTest (+1 unique-fn test). - Regression Test: 3 new suites under regression-test/suites/nereids_rules_p0/unique_function/ (window, partition_topn, join_other_condition). - Behavior changed: Yes (queries that previously returned incorrect results due to unsafe pushdown of unique-function predicates across Window / PartitionTopN / Join boundaries will now return correct results). - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ue functions through Generate and CTE consumer #62705 (#62734) Cherry-picked from #62705 Co-authored-by: yujun <yujun@selectdb.com>

…in pushdown ### What problem does this PR solve? Issue Number: N/A Related PR: apache#62705 Problem Summary: Following up on apache#62705, several Nereids rewrite rules still moved predicates / conjuncts containing non-idempotent (unique) functions such as rand() / uuid() / random_bytes() across operator boundaries in ways that changed query semantics. The common root cause is that a predicate like `rand() > 0.5` has an empty input-slot set, so the `containsAll(emptySet)` / `allMatch(emptyStream)` guards used by these rules silently returned true and allowed the push down. This PR adds guards to the reachable cases: 1. PushDownFilterThroughWindow: skip conjuncts with unique functions. Pushing a unique predicate below a window operator re-samples the base rows and changes every window-function value. 2. PushDownFilterThroughPartitionTopN: same as Window — skip unique conjuncts in the split loop. 3. PushDownJoinOtherCondition: keep conjuncts with unique functions AND empty input slots in the join's otherJoinConjuncts. `rand() > 0.5` with no slots would otherwise be pushed arbitrarily to the left child because `leftOutput.containsAll(emptySet)` is always true. When the unique conjunct has side-specific input slots (e.g. `t1.id + rand() > 0.5`), push-down is still allowed — output cardinality expectation is preserved and pre-join evaluation is what users typically want for sampling predicates. 4. PushDownFilterThroughJoin: same as (3) for the filter-through-join path. Also protects the "no slots → duplicate to both children" branch, which would otherwise produce two independent random draws per row. Other candidates that were initially explored (Repeat / Sort elimination / TopN-through-Union) were empirically verified to be unreachable due to upstream normalization (PushDownFilterThroughAggregation already blocks unique conjuncts before they reach Repeat; NormalizeSort wraps rand() in a Project; TopN order keys are already materialized Slots), so no dead guards were added for those. ### Release note Fix wrong results when `rand() > 0.5`-style predicates with empty input slots were pushed below Window / PartitionTopN, or moved out of a Join's other-conjuncts / above-Join filter onto one of the join inputs. ### Check List (For Author) - Test: - Regression Test: 3 suites under regression-test/suites/nereids_rules_p0/unique_function/ (window, partition_topn, join_other_condition) plus new cases in push_down_filter_through_join_with_unique_function for the side-specific unique-slot vs empty-slot distinction. - Behavior changed: Yes (queries that previously returned incorrect results due to unsafe pushdown of unique-function predicates across Window / PartitionTopN / Join boundaries will now return correct results). - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… through Generate and CTE consumer (apache#62705) Issue Number: close apache#25201, close apache#25202 Problem Summary: Two Nereids rewrite rules moved filter conjuncts that contain non-idempotent `UniqueFunction` calls (`rand` / `uuid` / `random_bytes` / `uuid_numeric`) across operators that change how many times the unique function is evaluated, producing wrong results. 1. `PushDownFilterThroughGenerate` pushed a conjunct like `t1.id + rand(1,100) > 5` below `LogicalGenerate`. Before the push, `rand` is evaluated per generated row; after, it is evaluated per base row and then the result is duplicated for every row produced by generate, so groups of N generated rows share a single rand value instead of N independent ones. 2. `CollectFilterAboveConsumer` registered filter conjuncts above a CTE consumer into `cascadesContext.putConsumerIdToFilter(...)`, after which `RewriteCteChildren.tryToConstructFilter` would OR them up and push them into the CTE producer. For a conjunct like `rand() > 0.1`, that causes the random filter to run on both the producer scan and each consumer filter, and different consumers would see inconsistent rows. Fix: in both rules, skip conjuncts whose `containsUniqueFunction()` is true so they stay above the operator and are evaluated once per output row. Adjacent rules (`PushDownFilterThroughRepeat/Window/PartitionTopN/SetOperation`, `PullUpPredicates` and its consumers) have the same class of bug but are out of scope for this PR and will be addressed separately. (cherry picked from commit 55ee1fa)

Keep the PR apache#62705 backport test compatible with branch-4.0 by using ExplodeNumbers instead of the newer Unnest generator. Key changes: - replace Unnest with ExplodeNumbers in PushDownFilterThroughGenerateTest Unit Test: - ./build.sh --fe --clean -j20 - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushDownFilterThroughGenerateTest,org.apache.doris.nereids.rules.rewrite.CollectFilterAboveConsumerTest

…guards (#62705) (#62750) ### What problem does this PR solve? Backport of #62705 to branch-4.0. Problem Summary: - backport the unique-function push-down guards for Generate and CTE consumer - adapt PushDownFilterThroughGenerateTest to use ExplodeNumbers on branch-4.0 ### Release note Fix wrong results on branch-4.0 when filters containing non-idempotent functions are pushed through Generate or CTE consumer. ### Check List (For Author) - Test: - FE build: ./build.sh --fe --clean -j20 - FE unit test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushDownFilterThroughGenerateTest,org.apache.doris.nereids.rules.rewrite.CollectFilterAboveConsumerTest - Behavior changed: Yes. Same fix scope as #62705 on branch-4.0. - Does this need documentation: No

…in pushdown Issue Number: N/A Related PR: apache#62705 Problem Summary: Following up on apache#62705, several Nereids rewrite rules still moved predicates / conjuncts containing non-idempotent (unique) functions such as rand() / uuid() / random_bytes() across operator boundaries in ways that changed query semantics. The common root cause is that a predicate like `rand() > 0.5` has an empty input-slot set, so the `containsAll(emptySet)` / `allMatch(emptyStream)` guards used by these rules silently returned true and allowed the push down. This PR adds guards to the reachable cases: 1. PushDownFilterThroughWindow: skip conjuncts with unique functions. Pushing a unique predicate below a window operator re-samples the base rows and changes every window-function value. 2. PushDownFilterThroughPartitionTopN: same as Window — skip unique conjuncts in the split loop. 3. PushDownJoinOtherCondition: keep conjuncts with unique functions AND empty input slots in the join's otherJoinConjuncts. `rand() > 0.5` with no slots would otherwise be pushed arbitrarily to the left child because `leftOutput.containsAll(emptySet)` is always true. When the unique conjunct has side-specific input slots (e.g. `t1.id + rand() > 0.5`), push-down is still allowed — output cardinality expectation is preserved and pre-join evaluation is what users typically want for sampling predicates. 4. PushDownFilterThroughJoin: same as (3) for the filter-through-join path. Also protects the "no slots → duplicate to both children" branch, which would otherwise produce two independent random draws per row. Other candidates that were initially explored (Repeat / Sort elimination / TopN-through-Union) were empirically verified to be unreachable due to upstream normalization (PushDownFilterThroughAggregation already blocks unique conjuncts before they reach Repeat; NormalizeSort wraps rand() in a Project; TopN order keys are already materialized Slots), so no dead guards were added for those. Fix wrong results when `rand() > 0.5`-style predicates with empty input slots were pushed below Window / PartitionTopN, or moved out of a Join's other-conjuncts / above-Join filter onto one of the join inputs. - Test: - Regression Test: 3 suites under regression-test/suites/nereids_rules_p0/unique_function/ (window, partition_topn, join_other_condition) plus new cases in push_down_filter_through_join_with_unique_function for the side-specific unique-slot vs empty-slot distinction. - Behavior changed: Yes (queries that previously returned incorrect results due to unsafe pushdown of unique-function predicates across Window / PartitionTopN / Join boundaries will now return correct results). - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

starocean999 approved these changes Apr 23, 2026

View reviewed changes

github-actions Bot added approved Indicates a PR has been approved by one committer. reviewed labels Apr 23, 2026

starocean999 merged commit 55ee1fa into apache:master Apr 23, 2026
31 of 32 checks passed

github-actions Bot reviewed Apr 23, 2026

View reviewed changes

starocean999 added dev/4.0.x dev/4.1.x labels Apr 23, 2026

This was referenced Apr 23, 2026

[fix](nereids) Guard UniqueFunction in multiple filter/topn pushdown rules #62742

Open

branch-4.0： [fix](nereids) Backport unique-function filter push-down guards (#62705) #62750

Merged

yiguolei pushed a commit that referenced this pull request Apr 23, 2026

branch-4.1: [fix](nereids) Block push-down of filters containing uniq…

33baee5

…ue functions through Generate and CTE consumer #62705 (#62734) Cherry-picked from #62705 Co-authored-by: yujun <yujun@selectdb.com>

yiguolei removed the dev/4.0.x label May 7, 2026

yiguolei added the dev/4.0.6-merged label May 7, 2026

morrySnow added dev/4.1.1-merged and removed dev/4.1.x labels May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](nereids) Block push-down of filters containing unique functions through Generate and CTE consumer#62705

[fix](nereids) Block push-down of filters containing unique functions through Generate and CTE consumer#62705
starocean999 merged 1 commit into
apache:masterfrom
yujun777:fix-push-down-filter-unique-function

yujun777 commented Apr 22, 2026

Uh oh!

Thearas commented Apr 22, 2026

Uh oh!

yujun777 commented Apr 22, 2026

Uh oh!

yujun777 commented Apr 22, 2026

Uh oh!

hello-stephen commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

hello-stephen commented Apr 22, 2026

Uh oh!

starocean999 commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

yujun777 commented Apr 22, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Uh oh!

Thearas commented Apr 22, 2026

Uh oh!

yujun777 commented Apr 22, 2026

Uh oh!

yujun777 commented Apr 22, 2026

Uh oh!

hello-stephen commented Apr 22, 2026

FE UT Coverage Report

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

hello-stephen commented Apr 22, 2026

FE Regression Coverage Report

Uh oh!

starocean999 commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants