[SPARK-12258] [SQL] passing null into ScalaUDF (follow-up) #10266

davies · 2015-12-11T05:45:55Z

This is a follow-up PR for #10259

cloud-fan · 2015-12-11T05:53:22Z

I tried it locally, here is my findings:

int i = false ? null : (Integer) 1; compiles
int i = false ? null : (Integer) t; compiles
int i = false ? null : (Integer) -1; doesn't compile
int i = false ? (Integer) null : (Integer) -1; doesn't compile
int i = false ? null : (Integer) (-1); compiles

So I think a simple fix is just adding () around ${eval.value}, but I can't think of a test case to reproduce it...

davies · 2015-12-11T05:57:18Z

Could you try to add (Integer) before null?

yhuai · 2015-12-11T06:09:41Z

Wenchen, should the type of i be Integer?

cloud-fan · 2015-12-11T06:18:17Z

I changed int to Integer and tried again ,the result is the same. And I also tried Integer i = (Integer) -1; which also failed to compile. I think the problem is when we use negative literal with explicit type cast, the - are mistakenly parsed and we need to wrap it with ().

davies · 2015-12-11T06:27:15Z

It's not a Janino bug, (Integer)-1 does not work in Java, faint :-(

markhamstra · 2015-12-11T06:28:59Z

@davies This results in a slightly different failure from the one I previously reported:

Everything looks the same as the prior post except now:

failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 78, Column 193: Expression "java.lang.Integer" is not an rvalue
.
.
.
/* 078 */     Long result16 = (Long)catalystConverter15.apply(udf20.apply(converter17.apply(isNull21 ? (UTF8String) null : (UTF8String) primitive22),converter18.apply(false ? (Integer) null : (Integer) -1),converter19.apply(isNull26 ? (Long) null : (Long) primitive27)));
.
.
.

davies · 2015-12-11T06:29:51Z

@markhamstra Sorry, just pushed a commit to fix it now, added a regression test, could you check it again?

markhamstra · 2015-12-11T06:30:28Z

No problem; I'll cherry-pick another.

davies · 2015-12-11T06:31:24Z

@markhamstra Once it works, I will merge this to unblock RC2.

cloud-fan · 2015-12-11T06:32:17Z

LGTM pending tests.

markhamstra · 2015-12-11T06:53:54Z

Still doesn't work for me. Now it ends up in a different place, but a NPE:

...
2015-12-11 06:48:09,285 INFO org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection: Code generated in 145.67804 ms
2015-12-11 06:48:09,297 INFO org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection: Code generated in 4.438909 ms
2015-12-11 06:48:09,305 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NullPointerException
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:261)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:259)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
2015-12-11 06:48:09,325 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NullPointerException
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:261)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:259)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
...

davies · 2015-12-11T07:06:40Z

@markhamstra I think it's because of your UDF did not handle null correctly.

markhamstra · 2015-12-11T07:14:20Z

@davies The exact same UDF worked fine in 1.5.

SparkQA · 2015-12-11T07:21:47Z

Test build #47572 has finished for PR 10266 at commit c0f85bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-11T07:47:52Z

Test build #2202 has finished for PR 10266 at commit c96b512.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class ExecutorClassLoader(\n

SparkQA · 2015-12-11T07:57:13Z

Test build #47574 has finished for PR 10266 at commit c96b512.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2015-12-11T08:33:28Z

hi @markhamstra , can you add some log in your UDF, to see if the NPE occurred before run into your UDF code or after?

davies · 2015-12-11T08:45:33Z

@cloud-fan @markhamstra They should be all fixed (handling null in arguments and results).

SparkQA · 2015-12-11T10:27:06Z

Test build #47578 has finished for PR 10266 at commit 2125a1b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2015-12-11T12:40:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala

+      boolean ${ev.isNull} = $resultTerm == null;
+      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
+      if (!${ev.isNull}) {
+        ${ev.value} = $resultTerm;


ah that's it, the result type may be primitive and we should not assign null value to it, or NPE will happen.

Should we create a JIRA for it? I think it's a different bug comparing to the one you fixed in #10259

Seems we are fine because we check if (!${ev.isNull}) first?

It will not cause NPE, but a compilation error?

For example, if dataType is Integer, line 1049 will be int ev.value = null
This statement will trigger a compilation error incompatible types. Right?

It's Integer b = null; int a = (Integer) b; , then NPE

I can understand your fix, but I am trying to see what @cloud-fan said above. It sounds like he found another issue?

@gatorsmile At line 1049, we are using the default value. For primitive types, it will not be null.

My understanding is that he was trying to explain the NPE reported by @markhamstra.

uh, the value of ${ctx.defaultValue(dataType)} is not null but -1 when data type is Integer. I do not have more questions. Thanks!

sorry for being vague, I was trying to explain why the NPE happened and @davies has fixed it.

markhamstra · 2015-12-11T15:54:14Z

Works for me. Thanks, guys!

yhuai · 2015-12-11T16:32:14Z

test this please

SparkQA · 2015-12-11T17:24:35Z

Test build #47584 has finished for PR 10266 at commit 2125a1b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-12-11T17:27:49Z

test this please

SparkQA · 2015-12-11T18:24:01Z

Test build #2204 has finished for PR 10266 at commit 2125a1b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class ExecutorClassLoader(\n * case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends LeafExpression\n

SparkQA · 2015-12-11T19:05:29Z

Test build #2205 has finished for PR 10266 at commit 2125a1b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class ExecutorClassLoader(\n

SparkQA · 2015-12-11T19:09:12Z

Test build #47585 has finished for PR 10266 at commit 2125a1b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

This is a follow-up PR for #10259 Author: Davies Liu <davies@databricks.com> Closes #10266 from davies/null_udf2. (cherry picked from commit c119a34) Signed-off-by: Davies Liu <davies.liu@gmail.com>

fix passing null into ScalaUDF

c0f85bb

davies mentioned this pull request Dec 11, 2015

[SPARK-12258] [SQL] passing null into ScalaUDF #10259

Closed

add test

c96b512

fix bug in handling result (null)

2125a1b

cloud-fan reviewed Dec 11, 2015
View reviewed changes

asfgit closed this in c119a34 Dec 11, 2015

[SPARK-12258] [SQL] passing null into ScalaUDF (follow-up) #10266

[SPARK-12258] [SQL] passing null into ScalaUDF (follow-up) #10266

Conversation

davies commented Dec 11, 2015

cloud-fan commented Dec 11, 2015

davies commented Dec 11, 2015

yhuai commented Dec 11, 2015

cloud-fan commented Dec 11, 2015

davies commented Dec 11, 2015

markhamstra commented Dec 11, 2015

davies commented Dec 11, 2015

markhamstra commented Dec 11, 2015

davies commented Dec 11, 2015

cloud-fan commented Dec 11, 2015

markhamstra commented Dec 11, 2015

davies commented Dec 11, 2015

markhamstra commented Dec 11, 2015

SparkQA commented Dec 11, 2015

SparkQA commented Dec 11, 2015

SparkQA commented Dec 11, 2015

cloud-fan commented Dec 11, 2015

davies commented Dec 11, 2015

SparkQA commented Dec 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markhamstra commented Dec 11, 2015

yhuai commented Dec 11, 2015

SparkQA commented Dec 11, 2015

yhuai commented Dec 11, 2015

SparkQA commented Dec 11, 2015

SparkQA commented Dec 11, 2015

SparkQA commented Dec 11, 2015