-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[fix](nereids) Guard UniqueFunction in SetOperation and NLJ project rewrites #62754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -114,6 +114,13 @@ public Expression visit(Expression expression, ReplacerContext ctx) { | |
| if (input.isEmpty() || expression instanceof Slot) { | ||
| return expression; | ||
| } | ||
| // A mixed expression like `t1.a + rand() > t2.b` has inputSlots={t1.a}; if we alias | ||
| // it into a child Project, rand()'s evaluation granularity changes from "per join | ||
| // pair" to "per row of that child", which silently changes results. Keep such | ||
| // expressions inline in otherJoinConjuncts. | ||
| if (expression.containsUniqueFunction()) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This guard is still too local to preserve join-pair semantics. Can we also block the later join-side materialization for unique functions that live inside mixed-side
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replied in the main PR comment: this is a pre-existing trade-off in |
||
| return expression; | ||
| } | ||
| if (ctx.leftSlots.containsAll(input)) { | ||
| Alias alias = ctx.aliasMap.computeIfAbsent(expression, o -> new Alias(o)); | ||
| ctx.leftAlias.add(alias); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| -- This file is automatically generated. You should know what you did if you want to edit this | ||
| -- !except_with_rand -- | ||
| PhysicalResultSink | ||
| --PhysicalExcept | ||
| ----PhysicalProject[t1.id] | ||
| ------filter(((cast(id as DOUBLE) + random()) > 5.0)) | ||
| --------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------PhysicalOlapScan[t2] | ||
|
|
||
| -- !intersect_with_rand -- | ||
| PhysicalResultSink | ||
| --PhysicalIntersect | ||
| ----PhysicalProject[t1.id] | ||
| ------filter(((cast(id as DOUBLE) + random()) > 5.0)) | ||
| --------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------PhysicalOlapScan[t2] | ||
|
|
||
| -- !except_with_uuid -- | ||
| PhysicalResultSink | ||
| --PhysicalExcept | ||
| ----PhysicalProject[t1.id] | ||
| ------filter(((uuid_to_int(uuid()) + cast(id as LARGEINT)) > 5)) | ||
| --------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------PhysicalOlapScan[t2] | ||
|
|
||
| -- !intersect_with_uuid -- | ||
| PhysicalResultSink | ||
| --PhysicalIntersect | ||
| ----PhysicalProject[t1.id] | ||
| ------filter(((uuid_to_int(uuid()) + cast(id as LARGEINT)) > 5)) | ||
| --------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------PhysicalOlapScan[t2] | ||
|
|
||
| -- !except_deterministic -- | ||
| PhysicalResultSink | ||
| --PhysicalExcept | ||
| ----PhysicalProject[t1.id] | ||
| ------filter((t1.id > 4)) | ||
| --------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------filter((t2.id > 4)) | ||
| --------PhysicalOlapScan[t2] | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| -- This file is automatically generated. You should know what you did if you want to edit this | ||
| -- !left_join_rand_one_side -- | ||
| PhysicalResultSink | ||
| --NestedLoopJoin[LEFT_OUTER_JOIN]((cast(id as DOUBLE) + random()) < cast(id as DOUBLE)) | ||
| ----PhysicalProject[t1.id] | ||
| ------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------PhysicalOlapScan[t2] | ||
|
|
||
| -- !cross_rand_one_side -- | ||
| PhysicalResultSink | ||
| --NestedLoopJoin[INNER_JOIN]((cast(id as DOUBLE) + random()) > cast(id as DOUBLE)) | ||
| ----PhysicalProject[t1.id] | ||
| ------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------PhysicalOlapScan[t2] | ||
|
|
||
| -- !cross_rand_both_sides -- | ||
| PhysicalResultSink | ||
| --NestedLoopJoin[INNER_JOIN]((cast(id as DOUBLE) + random()) > (cast(id as DOUBLE) + random())) | ||
| ----PhysicalProject[t1.id] | ||
| ------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------PhysicalOlapScan[t2] | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| -- This file is automatically generated. You should know what you did if you want to edit this | ||
| -- !union_distinct_keep_rand -- | ||
| PhysicalResultSink | ||
| --filter((random() > 0.1)) | ||
| ----hashAgg[GLOBAL] | ||
| ------hashAgg[LOCAL] | ||
| --------PhysicalUnion | ||
| ----------PhysicalProject[t1.id] | ||
| ------------PhysicalOlapScan[t1] | ||
| ----------PhysicalProject[t2.id] | ||
| ------------PhysicalOlapScan[t2] | ||
|
|
||
| -- !intersect_keep_rand -- | ||
| PhysicalResultSink | ||
| --filter((random() > 0.1)) | ||
| ----PhysicalIntersect | ||
| ------PhysicalProject[t1.id] | ||
| --------PhysicalOlapScan[t1] | ||
| ------PhysicalProject[t2.id] | ||
| --------PhysicalOlapScan[t2] | ||
|
|
||
| -- !except_keep_rand -- | ||
| PhysicalResultSink | ||
| --filter((random() > 0.1)) | ||
| ----PhysicalExcept | ||
| ------PhysicalProject[t1.id] | ||
| --------PhysicalOlapScan[t1] | ||
| ------PhysicalProject[t2.id] | ||
| --------PhysicalOlapScan[t2] | ||
|
|
||
| -- !union_all_push_rand -- | ||
| PhysicalResultSink | ||
| --PhysicalUnion | ||
| ----PhysicalProject[t1.id] | ||
| ------filter((random() > 0.1)) | ||
| --------PhysicalOlapScan[t1] | ||
| ----PhysicalProject[t2.id] | ||
| ------filter((random() > 0.1)) | ||
| --------PhysicalOlapScan[t2] | ||
|
|
||
| -- !union_distinct_split -- | ||
| PhysicalResultSink | ||
| --filter((random() > 0.1)) | ||
| ----PhysicalLimit[GLOBAL] | ||
| ------PhysicalUnion | ||
| --------PhysicalProject[t1.id] | ||
| ----------filter((t1.id = 1)) | ||
| ------------PhysicalOlapScan[t1] | ||
| --------PhysicalProject[t2.id] | ||
| ----------filter((t2.id = 1)) | ||
| ------------PhysicalOlapScan[t2] | ||
|
|
||
| -- !intersect_split -- | ||
| PhysicalResultSink | ||
| --filter((random() > 0.1)) | ||
| ----PhysicalIntersect | ||
| ------PhysicalProject[t1.id] | ||
| --------filter((t1.id = 1)) | ||
| ----------PhysicalOlapScan[t1] | ||
| ------PhysicalProject[t2.id] | ||
| --------filter((t2.id = 1)) | ||
| ----------PhysicalOlapScan[t2] | ||
|
|
||
| -- !except_split -- | ||
| PhysicalResultSink | ||
| --filter((random() > 0.1)) | ||
| ----PhysicalExcept | ||
| ------PhysicalProject[t1.id] | ||
| --------filter((t1.id = 1)) | ||
| ----------PhysicalOlapScan[t1] | ||
| ------PhysicalProject[t2.id] | ||
| --------filter((t2.id = 1)) | ||
| ----------PhysicalOlapScan[t2] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guard stops
PROJECT_OTHER_JOIN_CONDITIONfrom pushing the expression into a child project, but the same semantic change still happens later inAddProjectForUniqueFunction.JoinRewrite. The updatedregression-test/data/nereids_rules_p0/unique_function/add_project_for_unique_function.outstill showst1.id + t2.id + random(1, 100) between 10 and 20becoming aPhysicalProject[random(...) AS ...]under the NLJ, sorandom()is evaluated once per left row instead of once per join pair. Please guard that later join rewrite too (or otherwise preserve per-pair evaluation), and add a regression that exercises theBETWEEN/ duplicated-UniqueFunctionpath.`There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied in the main PR comment: this is a pre-existing trade-off in
AddProjectForUniqueFunction.JoinRewritecaused byBETWEENexpansion into two independentrand()calls, and is intentionally out of scope for this PR. Please see the main comment for the full reasoning.