Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12258] [SQL] passing null into ScalaUDF (follow-up) #10266

Closed
wants to merge 3 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Dec 11, 2015

This is a follow-up PR for #10259

@cloud-fan
Copy link
Contributor

I tried it locally, here is my findings:

  • int i = false ? null : (Integer) 1; compiles
  • int i = false ? null : (Integer) t; compiles
  • int i = false ? null : (Integer) -1; doesn't compile
  • int i = false ? (Integer) null : (Integer) -1; doesn't compile
  • int i = false ? null : (Integer) (-1); compiles

So I think a simple fix is just adding () around ${eval.value}, but I can't think of a test case to reproduce it...

@davies
Copy link
Contributor Author

davies commented Dec 11, 2015

Could you try to add (Integer) before null?

@yhuai
Copy link
Contributor

yhuai commented Dec 11, 2015

Wenchen, should the type of i be Integer?

@cloud-fan
Copy link
Contributor

I changed int to Integer and tried again ,the result is the same. And I also tried Integer i = (Integer) -1; which also failed to compile. I think the problem is when we use negative literal with explicit type cast, the - are mistakenly parsed and we need to wrap it with ().

@davies
Copy link
Contributor Author

davies commented Dec 11, 2015

It's not a Janino bug, (Integer)-1 does not work in Java, faint :-(

@markhamstra
Copy link
Contributor

@davies This results in a slightly different failure from the one I previously reported:

Everything looks the same as the prior post except now:

failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 78, Column 193: Expression "java.lang.Integer" is not an rvalue
.
.
.
/* 078 */     Long result16 = (Long)catalystConverter15.apply(udf20.apply(converter17.apply(isNull21 ? (UTF8String) null : (UTF8String) primitive22),converter18.apply(false ? (Integer) null : (Integer) -1),converter19.apply(isNull26 ? (Long) null : (Long) primitive27)));
.
.
.

@davies
Copy link
Contributor Author

davies commented Dec 11, 2015

@markhamstra Sorry, just pushed a commit to fix it now, added a regression test, could you check it again?

@markhamstra
Copy link
Contributor

No problem; I'll cherry-pick another.

@davies
Copy link
Contributor Author

davies commented Dec 11, 2015

@markhamstra Once it works, I will merge this to unblock RC2.

@cloud-fan
Copy link
Contributor

LGTM pending tests.

@markhamstra
Copy link
Contributor

Still doesn't work for me. Now it ends up in a different place, but a NPE:

...
2015-12-11 06:48:09,285 INFO org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection: Code generated in 145.67804 ms
2015-12-11 06:48:09,297 INFO org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection: Code generated in 4.438909 ms
2015-12-11 06:48:09,305 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NullPointerException
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:261)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:259)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
2015-12-11 06:48:09,325 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NullPointerException
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
    at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:261)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:259)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
...

@davies
Copy link
Contributor Author

davies commented Dec 11, 2015

@markhamstra I think it's because of your UDF did not handle null correctly.

@markhamstra
Copy link
Contributor

@davies The exact same UDF worked fine in 1.5.

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47572 has finished for PR 10266 at commit c0f85bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #2202 has finished for PR 10266 at commit c96b512.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class ExecutorClassLoader(\n

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47574 has finished for PR 10266 at commit c96b512.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

hi @markhamstra , can you add some log in your UDF, to see if the NPE occurred before run into your UDF code or after?

@davies
Copy link
Contributor Author

davies commented Dec 11, 2015

@cloud-fan @markhamstra They should be all fixed (handling null in arguments and results).

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47578 has finished for PR 10266 at commit 2125a1b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

boolean ${ev.isNull} = $resultTerm == null;
${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
if (!${ev.isNull}) {
${ev.value} = $resultTerm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's it, the result type may be primitive and we should not assign null value to it, or NPE will happen.

Should we create a JIRA for it? I think it's a different bug comparing to the one you fixed in #10259

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we are fine because we check if (!${ev.isNull}) first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not cause NPE, but a compilation error?

For example, if dataType is Integer, line 1049 will be int ev.value = null
This statement will trigger a compilation error incompatible types. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's Integer b = null; int a = (Integer) b; , then NPE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand your fix, but I am trying to see what @cloud-fan said above. It sounds like he found another issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile At line 1049, we are using the default value. For primitive types, it will not be null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that he was trying to explain the NPE reported by @markhamstra.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, the value of ${ctx.defaultValue(dataType)} is not null but -1 when data type is Integer. I do not have more questions. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for being vague, I was trying to explain why the NPE happened and @davies has fixed it.

@markhamstra
Copy link
Contributor

Works for me. Thanks, guys!

@yhuai
Copy link
Contributor

yhuai commented Dec 11, 2015

test this please

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47584 has finished for PR 10266 at commit 2125a1b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Dec 11, 2015

test this please

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #2204 has finished for PR 10266 at commit 2125a1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class ExecutorClassLoader(\n * case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends LeafExpression\n

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #2205 has finished for PR 10266 at commit 2125a1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class ExecutorClassLoader(\n

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47585 has finished for PR 10266 at commit 2125a1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Dec 11, 2015
This is a follow-up PR for #10259

Author: Davies Liu <davies@databricks.com>

Closes #10266 from davies/null_udf2.

(cherry picked from commit c119a34)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
@asfgit asfgit closed this in c119a34 Dec 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants