[SPARK-13651] Generator outputs are not resolved correctly resulting in run time error #11497

dilipbiswal · 2016-03-03T16:28:53Z

What changes were proposed in this pull request?

Seq(("id1", "value1")).toDF("key", "value").registerTempTable("src")
sqlContext.sql("SELECT t1.* FROM src LATERAL VIEW explode(map('key1', 100, 'key2', 200)) t1 AS key, value")

Results in following logical plan

Project [key#2,value#3]
+- Generate explode(HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMap(key1,100,key2,200)), true, false, Some(genoutput), [key#2,value#3]
   +- SubqueryAlias src
      +- Project [_1#0 AS key#2,_2#1 AS value#3]
         +- LocalRelation [_1#0,_2#1], [[id1,value1]]

The above query fails with following runtime error.

java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String
    at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
    at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:221)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(generated.java:42)
    at org.apache.spark.sql.execution.Generate$$anonfun$doExecute$1$$anonfun$apply$9.apply(Generate.scala:98)
    at org.apache.spark.sql.execution.Generate$$anonfun$doExecute$1$$anonfun$apply$9.apply(Generate.scala:96)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
    at scala.collection.Iterator$class.foreach(Iterator.scala:742)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
        <stack-trace omitted.....>

In this case the generated outputs are wrongly resolved from its child (LocalRelation) due to
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L537-L548

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Added unit tests in hive/SQLQuerySuite and AnalysisSuite

dilipbiswal · 2016-03-03T16:30:46Z

cc @cloud-fan @gatorsmile

gatorsmile · 2016-03-03T16:30:55Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

@@ -92,6 +92,16 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
    checkAnswer(query, Row(1, 1) :: Row(1, 2) :: Row(1, 3) :: Nil)
  }

+  test("SPARK-13651: generator outputs shouldn't be resolved from its child's outpu") {


Nit: outpu -> output

Thanks. Fixed it.

gatorsmile · 2016-03-03T16:32:56Z

ok to test

gatorsmile · 2016-03-03T16:33:06Z

LGTM. Please update your PR description with runtime error messages you hit. (Not necessary to post the whole stack. I think, just the first few lines are enough) Thanks!

SparkQA · 2016-03-03T22:02:08Z

Test build #2610 has finished for PR 11497 at commit abf868f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-03-03T23:17:32Z

retest this please

dilipbiswal · 2016-03-03T23:24:36Z

I ran this test 5 times in my development machine and it failed once. It looks like an intermittent failure. Also i verified the plan for the failing test.

Project [(id#0L % cast(2 as bigint)) AS key#1L,if (isnull(cast(id#0L as int))) null else UDF(cast(id#0L as int)) AS UDF(id)#2]
+- Range 0, 10, 1, 32, [id#0L]

There is no Generate in the plan and so the fix shouldn't affect this testcase.

cloud-fan · 2016-03-04T01:52:59Z

retest this please

SparkQA · 2016-03-04T03:51:57Z

Test build #52432 has finished for PR 11497 at commit abf868f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-03-04T08:36:34Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

+
+    val query =
+      input
+        .generate(Explode(generatorInput), join = true,


Is this a valid plan? The input to Explode is an Attribute named a, which is not in the output of input.

@cloud-fan Thank you. You are right. Wenchen, i just realized that its pretty hard to simulate the error in AnalysisSuite. For this problem to happen, we need to have the rules fired in following sequence.

First ResolveGenerate be a no-op because the generator is not resolved.

Generator is resolved through ResolveFunction.

ResolveReference now resolves the generator output attributes from child;s output

in AnalysisSuite we have an empty function registry thus i am unable to simulate this error in this
test. If you are ok, i am thinking of removing this test and getting it tested through SQLQuerySuite.

Please let me know what you think.

…in runtime error

cloud-fan · 2016-03-06T08:20:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -512,6 +512,9 @@ class Analyzer(

      // A special case for Generate, because the output of Generate should not be resolved by
      // ResolveReferences. Attributes in the output will be resolved by ResolveGenerate.
+      case g @ Generate(generator, _, _, _, _, _)
+        if !g.resolved && generator.resolved => g
+
      case g @ Generate(generator, join, outer, qualifier, output, child)


I think this 2 cases can be simplified to:

case g: Generate if g.generator.resolved => g case g @ Generate(generator, join, outer, qualifier, output, child) => the generator resolution logic...

We only care about whether generator is resolved or not.

@cloud-fan Thanks !! Made the change.

Should we still keep if child.resolved?

@davies Hi Davis, I also was thinking about it. I felt its probably safer to handle the generate plan by these two cases and not fall through the last case like we do for this defect.
@cloud-fan What do you think ?

if child.resolved is guaranteed at the beginning of this rule:
case p: LogicalPlan if !p.childrenResolved => p

@cloud-fan Thank you !!

cloud-fan · 2016-03-07T02:51:07Z

LGTM, cc @davies (who fixed this special case before)

dilipbiswal · 2016-03-07T03:30:19Z

@cloud-fan Can we trigger a test please ?

gatorsmile · 2016-03-07T03:57:39Z

test this please

cloud-fan · 2016-03-07T04:32:45Z

retest this please

SparkQA · 2016-03-07T06:37:14Z

Test build #52540 has finished for PR 11497 at commit 93d6e69.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-03-07T17:46:03Z

Merging into master, thanks!

…pressions ## What changes were proposed in this pull request? It's weird that expressions don't always have all the expressions in it. This PR marks `QueryPlan.expressions` final to forbid sub classes overriding it to exclude some expressions. Currently only `Generate` override it, we can use `producedAttributes` to fix the unresolved attribute problem for it. Note that this PR doesn't fix the problem in #11497 ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11532 from cloud-fan/generate.

dilipbiswal · 2016-03-07T18:49:37Z

@cloud-fan @davies @gatorsmile Thank you !!

…in run time error ## What changes were proposed in this pull request? ``` Seq(("id1", "value1")).toDF("key", "value").registerTempTable("src") sqlContext.sql("SELECT t1.* FROM src LATERAL VIEW explode(map('key1', 100, 'key2', 200)) t1 AS key, value") ``` Results in following logical plan ``` Project [key#2,value#3] +- Generate explode(HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMap(key1,100,key2,200)), true, false, Some(genoutput), [key#2,value#3] +- SubqueryAlias src +- Project [_1#0 AS key#2,_2#1 AS value#3] +- LocalRelation [_1#0,_2#1], [[id1,value1]] ``` The above query fails with following runtime error. ``` java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:221) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(generated.java:42) at org.apache.spark.sql.execution.Generate$$anonfun$doExecute$1$$anonfun$apply$9.apply(Generate.scala:98) at org.apache.spark.sql.execution.Generate$$anonfun$doExecute$1$$anonfun$apply$9.apply(Generate.scala:96) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) <stack-trace omitted.....> ``` In this case the generated outputs are wrongly resolved from its child (LocalRelation) due to https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L537-L548 ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Added unit tests in hive/SQLQuerySuite and AnalysisSuite Author: Dilip Biswal <dbiswal@us.ibm.com> Closes apache#11497 from dilipbiswal/spark-13651.

…pressions ## What changes were proposed in this pull request? It's weird that expressions don't always have all the expressions in it. This PR marks `QueryPlan.expressions` final to forbid sub classes overriding it to exclude some expressions. Currently only `Generate` override it, we can use `producedAttributes` to fix the unresolved attribute problem for it. Note that this PR doesn't fix the problem in apache#11497 ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#11532 from cloud-fan/generate.

gatorsmile reviewed Mar 3, 2016
View reviewed changes

cloud-fan reviewed Mar 4, 2016
View reviewed changes

This was referenced Mar 4, 2016

[SPARK-13678][SQL] transformExpressions should only apply on QueryPlan.expressions #11521

Closed

[SPARK-13694][SQL] QueryPlan.expressions should always include all expressions #11532

Closed

[SPARK-13651] Generator outputs are not resolved correctly resulting …

c27caa4

…in runtime error

dilipbiswal force-pushed the spark-13651 branch from abf868f to c27caa4 Compare March 5, 2016 23:23

cloud-fan reviewed Mar 6, 2016
View reviewed changes

Review comments

93d6e69

asfgit closed this in d7eac9d Mar 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13651] Generator outputs are not resolved correctly resulting in run time error #11497

[SPARK-13651] Generator outputs are not resolved correctly resulting in run time error #11497

dilipbiswal commented Mar 3, 2016

dilipbiswal commented Mar 3, 2016

gatorsmile Mar 3, 2016

dilipbiswal Mar 3, 2016

gatorsmile commented Mar 3, 2016

gatorsmile commented Mar 3, 2016

SparkQA commented Mar 3, 2016

gatorsmile commented Mar 3, 2016

dilipbiswal commented Mar 3, 2016

cloud-fan commented Mar 4, 2016

SparkQA commented Mar 4, 2016

cloud-fan Mar 4, 2016

dilipbiswal Mar 4, 2016

cloud-fan Mar 6, 2016

dilipbiswal Mar 6, 2016

davies Mar 7, 2016

dilipbiswal Mar 7, 2016

cloud-fan Mar 7, 2016

dilipbiswal Mar 7, 2016

cloud-fan commented Mar 7, 2016

dilipbiswal commented Mar 7, 2016

gatorsmile commented Mar 7, 2016

cloud-fan commented Mar 7, 2016

SparkQA commented Mar 7, 2016

davies commented Mar 7, 2016

dilipbiswal commented Mar 7, 2016

[SPARK-13651] Generator outputs are not resolved correctly resulting in run time error #11497

[SPARK-13651] Generator outputs are not resolved correctly resulting in run time error #11497

Conversation

dilipbiswal commented Mar 3, 2016

What changes were proposed in this pull request?

How was this patch tested?

dilipbiswal commented Mar 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Mar 3, 2016

gatorsmile commented Mar 3, 2016

SparkQA commented Mar 3, 2016

gatorsmile commented Mar 3, 2016

dilipbiswal commented Mar 3, 2016

cloud-fan commented Mar 4, 2016

SparkQA commented Mar 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Mar 7, 2016

dilipbiswal commented Mar 7, 2016

gatorsmile commented Mar 7, 2016

cloud-fan commented Mar 7, 2016

SparkQA commented Mar 7, 2016

davies commented Mar 7, 2016

dilipbiswal commented Mar 7, 2016