-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6247][SQL] Fix resolution of ambiguous joins caused by new aliases #5062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #28700 has finished for PR 5062 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmbrus Only handle the Project probably not enough. For example:
test("self join with aliases#1") {
Seq(1).map(i => (i, i.toString)).toDF("int", "str").groupBy($"str").max("int").as("str").registerTempTable("df")
Seq(1).map(i => (i, i.toString)).toDF("int", "str").groupBy($"str").max("int").as("str").explain(true)
checkAnswer(
sql(
"""
|SELECT x.str, y.str
|FROM df x JOIN df y ON x.str = y.str
""".stripMargin),
Row(1, 1) :: Nil)
}
Output in console like
'Subquery str
'Aggregate ['str], ['str,MAX(int#48) AS MAX(int)#50]
Project [_1#46 AS int#48,_2#47 AS str#49]
LocalRelation [_1#46,_2#47], [[1,1]]
Aggregate [str#49], [str#49,MAX(int#48) AS MAX(int)#50]
Project [_1#46 AS int#48,_2#47 AS str#49]
LocalRelation [_1#46,_2#47], [[1,1]]
Aggregate [str#49], [str#49,MAX(int#48) AS MAX(int)#50]
LocalRelation [int#48,str#49], [[1,1]]
Aggregate false, [str#49], [str#49,MAX(PartialMax#53) AS MAX(int)#50]
Exchange (HashPartitioning [str#49], 5)
Aggregate true, [str#49], [str#49,MAX(int#48) AS PartialMax#53]
LocalTableScan [int#48,str#49], [[1,1]]
Code Generation: false
== RDD ==
next on empty iterator
java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:64)
at scala.collection.IterableLike$class.head(IterableLike.scala:91)
at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:47)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:120)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a doable solution is add a Project node on top of cached plan or temporal table when lookupRelation.
|
LGTM, and we can keep adding more rule if we missed one for the self join in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we ignore the qualifiers in the equal test? Will that cause another bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
Test build #28756 has finished for PR 5062 at commit
|
|
Thanks for looking this over and helping with the fix! Merging to master and 1.3 |
…ases We need to handle ambiguous `exprId`s that are produced by new aliases as well as those caused by leaf nodes (`MultiInstanceRelation`). Attempting to fix this revealed a bug in `equals` for `Alias` as these objects were comparing equal even when the expression ids did not match. Additionally, `LocalRelation` did not correctly provide statistics, and some tests in `catalyst` and `hive` were not using the helper functions for comparing plans. Based on #4991 by chenghao-intel Author: Michael Armbrust <michael@databricks.com> Closes #5062 from marmbrus/selfJoins and squashes the following commits: 8e9b84b [Michael Armbrust] check qualifier too 8038a36 [Michael Armbrust] handle aggs too 0b9c687 [Michael Armbrust] fix more tests c3c574b [Michael Armbrust] revert change. 725f1ab [Michael Armbrust] add statistics a925d08 [Michael Armbrust] check for conflicting attributes in join resolution b022ef7 [Michael Armbrust] Handle project aliases. d8caa40 [Michael Armbrust] test case: SPARK-6247 f9c67c2 [Michael Armbrust] Check for duplicate attributes in join resolution. 898af73 [Michael Armbrust] Fix Alias equality. (cherry picked from commit 3579003) Signed-off-by: Michael Armbrust <michael@databricks.com>
|
Test build #28761 has finished for PR 5062 at commit
|
We need to handle ambiguous
exprIds that are produced by new aliases as well as those caused by leaf nodes (MultiInstanceRelation).Attempting to fix this revealed a bug in
equalsforAliasas these objects were comparing equal even when the expression ids did not match. Additionally,LocalRelationdid not correctly provide statistics, and some tests incatalystandhivewere not using the helper functions for comparing plans.Based on #4991 by @chenghao-intel