[SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names#33004
[SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names#33004cloud-fan wants to merge 3 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test starting |
| @@ -220,7 +220,7 @@ object PullupCorrelatedPredicates extends Rule[LogicalPlan] with PredicateHelper | |||
| */ | |||
There was a problem hiding this comment.
Changes in this file are not quite necessary, but just to match the code in the analyzer side: when we need to pass around an outer plan, just pass it instead of its children.
|
Kubernetes integration test status failure |
|
Test build #140094 has finished for PR 33004 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #140098 has finished for PR 33004 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #140127 has finished for PR 33004 at commit
|
|
Thanks, merging to master |
What changes were proposed in this pull request?
The current OuterReference resolution is a bit weird: when the outer plan has more than one child, it resolves OuterReference from the output of each child, one by one, left to right.
This is incorrect in the case of join, as the column name can be ambiguous if both left and right sides output this column.
This PR fixes this bug by resolving OuterReference with
outerPlan.resolveChildren, instead of something likeouterPlan.children.foreach(_.resolve(...))Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
The problem only occurs in join, and join condition doesn't support correlated subquery yet. So this PR only improves the error message. Before this PR, people see
How was this patch tested?
a new test