[SPARK-13056][SQL] map column would throw NPE if value is null #10964

adrian-wang · 2016-01-28T05:50:27Z

Jira:
https://issues.apache.org/jira/browse/SPARK-13056

Create a map like
{ "a": "somestring", "b": null}
Query like
SELECT col["b"] FROM t1;
NPE would be thrown.

SparkQA · 2016-01-28T07:44:27Z

Test build #50253 has finished for PR 10964 at commit 5b2d942.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-01-28T21:26:08Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala

@@ -307,7 +307,7 @@ case class GetMapValue(child: Expression, key: Expression)
          }
        }

-        if ($found) {
+        if ($found && !$eval1.valueArray().isNullAt($index)) {


can we assign $eval1.valueArray() to a local variable？

cloud-fan · 2016-01-29T04:57:34Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

+
+  test("SPARK-13056: Null in map value causes NPE") {
+    Seq((1, "abc=somestring,cba")).toDF("key", "value").registerTempTable("mapsrc")
+    sql("""CREATE TABLE maptest AS SELECT str_to_map(value, ",", "=") as col1 FROM mapsrc""")


The str_to_map stuff makes this test a little hard to read, can we create a DataFrame by
val df = Seq(1 -> Map("a" -> "1", "b" -> null)).toDF("key", "value") and test it by DataFrames APIs directly?

SparkQA · 2016-01-29T06:20:50Z

Test build #50339 has finished for PR 10964 at commit 2244999.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-01-29T09:23:30Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

+
+  test("SPARK-13056: Null in map value causes NPE") {
+    val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
+    df.registerTempTable("maptest")


do we need to put the test in hive module? I think DataFrameSuite in sql core module is a good place to put this test, we can just test df.select($"value".apply("abc")) instead of registering a temp table.

That suite mainly test df api functionality, I think.

how about SQLQuerySuite in sql core? This bug has nothing to do with hive right?

SparkQA · 2016-01-29T10:28:18Z

Test build #50362 has finished for PR 10964 at commit bf429ca.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-02-01T04:58:02Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+
+  test("SPARK-13056: Null in map value causes NPE") {
+    val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
+    df.registerTempTable("maptest")


use withTempTable to delete the table after test.

tejasapatil · 2016-02-01T05:19:42Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+    val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
+    df.registerTempTable("maptest")
+    checkAnswer(sql("SELECT value['abc'] FROM maptest"), Row("somestring"))
+    checkAnswer(sql("SELECT value['cba'] FROM maptest"), Row(null))


Have you verified if this throws NPE without this fix ? The test runs fine over trunk:

scala> val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value") df: org.apache.spark.sql.DataFrame = [key: int, value: map<string,string>] scala> df.registerTempTable("maptest") scala> sqlContext.sql("SELECT value['cba'] FROM maptest").collect() res28: Array[org.apache.spark.sql.Row] = Array([null]) scala> sqlContext.sql("SELECT value['cba'] FROM maptest").foreach(println) [null]

I think we need to use RDD instead of Seq to build the DataFrame, or the local optimization will evaluate it directly, without codegen.

You need to modify the test case:

scala> sqlContext.sql("SELECT value['cba'] FROM maptest WHERE key = 1").collect() 16/01/31 21:22:13 ERROR Executor: Exception in task 15.0 in stage 2.0 (TID 47) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103).... ....

@tejasapatil OK, actually I used a udf to generate the map and got your exception, later I changed to this following the guide from @cloud-fan but didn't verify it myself. I'll modify the test case here, Thanks!

SparkQA · 2016-02-01T05:34:28Z

Test build #50472 has finished for PR 10964 at commit 21f29a5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-01T06:48:51Z

Test build #50474 has finished for PR 10964 at commit 898cf20.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-01T07:33:07Z

Test build #50477 has finished for PR 10964 at commit d77013f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-02-01T08:42:23Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+    val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
+    withTempTable("maptest") {
+      df.registerTempTable("maptest")
+      checkAnswer(sql("SELECT value['abc'] FROM maptest where key = 1"), Row("somestring"))


As I explained before, the problem is local optimization: #10964 (comment) , so adding a filter here do fixes the problem, by breaking the local optimization, and we should add comments to say it.

Yes, you are right

cloud-fan · 2016-02-01T09:06:52Z

LGTM, pending test

SparkQA · 2016-02-01T09:23:12Z

Test build #50483 has finished for PR 10964 at commit 5b83626.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

adrian-wang · 2016-02-01T09:41:55Z

retest this please.

SparkQA · 2016-02-01T11:13:49Z

Test build #50485 has finished for PR 10964 at commit 5b83626.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-02-02T19:31:43Z

Merged to master and 1.6

Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE would be thrown. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10964 from adrian-wang/npewriter. (cherry picked from commit 358300c) Signed-off-by: Michael Armbrust <michael@databricks.com> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

map column would throw NPE if value is null

5b2d942

cloud-fan reviewed Jan 28, 2016
View reviewed changes

address comments

2244999

cloud-fan reviewed Jan 29, 2016
View reviewed changes

address comments

bf429ca

cloud-fan reviewed Jan 29, 2016
View reviewed changes

move test

21f29a5

cloud-fan reviewed Feb 1, 2016
View reviewed changes

use withTempTable

898cf20

tejasapatil reviewed Feb 1, 2016
View reviewed changes

modify test case

d77013f

cloud-fan reviewed Feb 1, 2016
View reviewed changes

add comment

5b83626

asfgit closed this in 358300c Feb 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13056][SQL] map column would throw NPE if value is null #10964

[SPARK-13056][SQL] map column would throw NPE if value is null #10964

adrian-wang commented Jan 28, 2016

SparkQA commented Jan 28, 2016

cloud-fan Jan 28, 2016

cloud-fan Jan 29, 2016

SparkQA commented Jan 29, 2016

cloud-fan Jan 29, 2016

adrian-wang Jan 29, 2016

cloud-fan Jan 29, 2016

SparkQA commented Jan 29, 2016

cloud-fan Feb 1, 2016

tejasapatil Feb 1, 2016

cloud-fan Feb 1, 2016

tejasapatil Feb 1, 2016

adrian-wang Feb 1, 2016

SparkQA commented Feb 1, 2016

SparkQA commented Feb 1, 2016

SparkQA commented Feb 1, 2016

cloud-fan Feb 1, 2016

adrian-wang Feb 1, 2016

cloud-fan commented Feb 1, 2016

SparkQA commented Feb 1, 2016

adrian-wang commented Feb 1, 2016

SparkQA commented Feb 1, 2016

marmbrus commented Feb 2, 2016

[SPARK-13056][SQL] map column would throw NPE if value is null #10964

[SPARK-13056][SQL] map column would throw NPE if value is null #10964

Conversation

adrian-wang commented Jan 28, 2016

SparkQA commented Jan 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 1, 2016

SparkQA commented Feb 1, 2016

SparkQA commented Feb 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Feb 1, 2016

SparkQA commented Feb 1, 2016

adrian-wang commented Feb 1, 2016

SparkQA commented Feb 1, 2016

marmbrus commented Feb 2, 2016