-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13651] Generator outputs are not resolved correctly resulting in run time error #11497
Conversation
@@ -92,6 +92,16 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { | |||
checkAnswer(query, Row(1, 1) :: Row(1, 2) :: Row(1, 3) :: Nil) | |||
} | |||
|
|||
test("SPARK-13651: generator outputs shouldn't be resolved from its child's outpu") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: outpu -> output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed it.
ok to test |
LGTM. Please update your PR description with runtime error messages you hit. (Not necessary to post the whole stack. I think, just the first few lines are enough) Thanks! |
Test build #2610 has finished for PR 11497 at commit
|
retest this please |
I ran this test 5 times in my development machine and it failed once. It looks like an intermittent failure. Also i verified the plan for the failing test.
There is no Generate in the plan and so the fix shouldn't affect this testcase. |
retest this please |
Test build #52432 has finished for PR 11497 at commit
|
|
||
val query = | ||
input | ||
.generate(Explode(generatorInput), join = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid plan? The input to Explode
is an Attribute
named a
, which is not in the output of input
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thank you. You are right. Wenchen, i just realized that its pretty hard to simulate the error in AnalysisSuite. For this problem to happen, we need to have the rules fired in following sequence.
- First ResolveGenerate be a no-op because the generator is not resolved.
- Generator is resolved through ResolveFunction.
- ResolveReference now resolves the generator output attributes from child;s output
in AnalysisSuite we have an empty function registry thus i am unable to simulate this error in this
test. If you are ok, i am thinking of removing this test and getting it tested through SQLQuerySuite.
Please let me know what you think.
@@ -512,6 +512,9 @@ class Analyzer( | |||
|
|||
// A special case for Generate, because the output of Generate should not be resolved by | |||
// ResolveReferences. Attributes in the output will be resolved by ResolveGenerate. | |||
case g @ Generate(generator, _, _, _, _, _) | |||
if !g.resolved && generator.resolved => g | |||
|
|||
case g @ Generate(generator, join, outer, qualifier, output, child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this 2 cases can be simplified to:
case g: Generate if g.generator.resolved => g
case g @ Generate(generator, join, outer, qualifier, output, child) =>
the generator resolution logic...
We only care about whether generator
is resolved or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks !! Made the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still keep if child.resolved
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies Hi Davis, I also was thinking about it. I felt its probably safer to handle the generate plan by these two cases and not fall through the last case like we do for this defect.
@cloud-fan What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if child.resolved
is guaranteed at the beginning of this rule:
case p: LogicalPlan if !p.childrenResolved => p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thank you !!
LGTM, cc @davies (who fixed this special case before) |
@cloud-fan Can we trigger a test please ? |
test this please |
retest this please |
Test build #52540 has finished for PR 11497 at commit
|
Merging into master, thanks! |
…pressions ## What changes were proposed in this pull request? It's weird that expressions don't always have all the expressions in it. This PR marks `QueryPlan.expressions` final to forbid sub classes overriding it to exclude some expressions. Currently only `Generate` override it, we can use `producedAttributes` to fix the unresolved attribute problem for it. Note that this PR doesn't fix the problem in #11497 ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11532 from cloud-fan/generate.
@cloud-fan @davies @gatorsmile Thank you !! |
…in run time error ## What changes were proposed in this pull request? ``` Seq(("id1", "value1")).toDF("key", "value").registerTempTable("src") sqlContext.sql("SELECT t1.* FROM src LATERAL VIEW explode(map('key1', 100, 'key2', 200)) t1 AS key, value") ``` Results in following logical plan ``` Project [key#2,value#3] +- Generate explode(HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMap(key1,100,key2,200)), true, false, Some(genoutput), [key#2,value#3] +- SubqueryAlias src +- Project [_1#0 AS key#2,_2#1 AS value#3] +- LocalRelation [_1#0,_2#1], [[id1,value1]] ``` The above query fails with following runtime error. ``` java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:221) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(generated.java:42) at org.apache.spark.sql.execution.Generate$$anonfun$doExecute$1$$anonfun$apply$9.apply(Generate.scala:98) at org.apache.spark.sql.execution.Generate$$anonfun$doExecute$1$$anonfun$apply$9.apply(Generate.scala:96) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) <stack-trace omitted.....> ``` In this case the generated outputs are wrongly resolved from its child (LocalRelation) due to https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L537-L548 ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Added unit tests in hive/SQLQuerySuite and AnalysisSuite Author: Dilip Biswal <dbiswal@us.ibm.com> Closes apache#11497 from dilipbiswal/spark-13651.
…pressions ## What changes were proposed in this pull request? It's weird that expressions don't always have all the expressions in it. This PR marks `QueryPlan.expressions` final to forbid sub classes overriding it to exclude some expressions. Currently only `Generate` override it, we can use `producedAttributes` to fix the unresolved attribute problem for it. Note that this PR doesn't fix the problem in apache#11497 ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#11532 from cloud-fan/generate.
What changes were proposed in this pull request?
Results in following logical plan
The above query fails with following runtime error.
In this case the generated outputs are wrongly resolved from its child (LocalRelation) due to
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L537-L548
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Added unit tests in hive/SQLQuerySuite and AnalysisSuite