Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13056][SQL] map column would throw NPE if value is null #10964

Closed
wants to merge 7 commits into from

Conversation

adrian-wang
Copy link
Contributor

Jira:
https://issues.apache.org/jira/browse/SPARK-13056

Create a map like
{ "a": "somestring", "b": null}
Query like
SELECT col["b"] FROM t1;
NPE would be thrown.

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50253 has finished for PR 10964 at commit 5b2d942.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -307,7 +307,7 @@ case class GetMapValue(child: Expression, key: Expression)
}
}

if ($found) {
if ($found && !$eval1.valueArray().isNullAt($index)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assign $eval1.valueArray() to a local variable?


test("SPARK-13056: Null in map value causes NPE") {
Seq((1, "abc=somestring,cba")).toDF("key", "value").registerTempTable("mapsrc")
sql("""CREATE TABLE maptest AS SELECT str_to_map(value, ",", "=") as col1 FROM mapsrc""")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The str_to_map stuff makes this test a little hard to read, can we create a DataFrame by
val df = Seq(1 -> Map("a" -> "1", "b" -> null)).toDF("key", "value") and test it by DataFrames APIs directly?

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50339 has finished for PR 10964 at commit 2244999.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


test("SPARK-13056: Null in map value causes NPE") {
val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
df.registerTempTable("maptest")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to put the test in hive module? I think DataFrameSuite in sql core module is a good place to put this test, we can just test df.select($"value".apply("abc")) instead of registering a temp table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That suite mainly test df api functionality, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about SQLQuerySuite in sql core? This bug has nothing to do with hive right?

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50362 has finished for PR 10964 at commit bf429ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


test("SPARK-13056: Null in map value causes NPE") {
val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
df.registerTempTable("maptest")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use withTempTable to delete the table after test.

val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
df.registerTempTable("maptest")
checkAnswer(sql("SELECT value['abc'] FROM maptest"), Row("somestring"))
checkAnswer(sql("SELECT value['cba'] FROM maptest"), Row(null))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified if this throws NPE without this fix ? The test runs fine over trunk:

scala> val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
df: org.apache.spark.sql.DataFrame = [key: int, value: map<string,string>]

scala> df.registerTempTable("maptest")

scala> sqlContext.sql("SELECT value['cba'] FROM maptest").collect()
res28: Array[org.apache.spark.sql.Row] = Array([null])

scala> sqlContext.sql("SELECT value['cba'] FROM maptest").foreach(println)
[null]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use RDD instead of Seq to build the DataFrame, or the local optimization will evaluate it directly, without codegen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to modify the test case:

scala> sqlContext.sql("SELECT value['cba'] FROM maptest WHERE key = 1").collect()
16/01/31 21:22:13 ERROR Executor: Exception in task 15.0 in stage 2.0 (TID 47)
java.lang.NullPointerException
    at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
    at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)....
....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tejasapatil OK, actually I used a udf to generate the map and got your exception, later I changed to this following the guide from @cloud-fan but didn't verify it myself. I'll modify the test case here, Thanks!

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50472 has finished for PR 10964 at commit 21f29a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50474 has finished for PR 10964 at commit 898cf20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50477 has finished for PR 10964 at commit d77013f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value")
withTempTable("maptest") {
df.registerTempTable("maptest")
checkAnswer(sql("SELECT value['abc'] FROM maptest where key = 1"), Row("somestring"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained before, the problem is local optimization: #10964 (comment) , so adding a filter here do fixes the problem, by breaking the local optimization, and we should add comments to say it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right

@cloud-fan
Copy link
Contributor

LGTM, pending test

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50483 has finished for PR 10964 at commit 5b83626.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50485 has finished for PR 10964 at commit 5b83626.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 358300c Feb 2, 2016
@marmbrus
Copy link
Contributor

marmbrus commented Feb 2, 2016

Merged to master and 1.6

asfgit pushed a commit that referenced this pull request Feb 2, 2016
Jira:
https://issues.apache.org/jira/browse/SPARK-13056

Create a map like
{ "a": "somestring", "b": null}
Query like
SELECT col["b"] FROM t1;
NPE would be thrown.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #10964 from adrian-wang/npewriter.

(cherry picked from commit 358300c)
Signed-off-by: Michael Armbrust <michael@databricks.com>

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants