-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11725][SQL] correctly handle null inputs for UDF #9770
Conversation
cc @marmbrus |
Test build #46088 has finished for PR 9770 at commit
|
assert(parameterTypes.length == inputs.length) | ||
|
||
parameterTypes.zip(inputs).filter(_._1.isPrimitive).map(_._2).foldLeft(udf: Expression) { | ||
case (result, input) => If(IsNull(input), Literal.create(null, udf.dataType), result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a lot easier to read in the query plan if you created a single If
with Or
s.
Test build #46090 has finished for PR 9770 at commit
|
Test build #46130 has finished for PR 9770 at commit
|
assert(parameterTypes.length == inputs.length) | ||
|
||
val inputsNullCheck = parameterTypes.zip(inputs) | ||
// TODO: skip null handling for not-nullable primitive inputs after we can completely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the fact that most of the common code passes are not using nullable
(for example generated expression, join), it could have some corner cases that the nullable
is not generated correctly (for some data sources), I think it's risky for 1.6.
I'd vote to do that in next release (consider nullable
in most places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To play devils advocate, I think when the info is wrong is usually likely to be too conservative (allow nulls when there are none). Also, I'm not really sure what is going to change between now and 1.7 (i.e. if there are bugs we need to find them eventually).
That said, I'm fine waiting, but we should use this info eventually given the amount of effort we spend passing it around.
Test build #46169 has finished for PR 9770 at commit
|
Test build #46171 has finished for PR 9770 at commit
|
Looks great! Merging to master and 1.6. |
If user use primitive parameters in UDF, there is no way for him to do the null-check for primitive inputs, so we are assuming the primitive input is null-propagatable for this case and return null if the input is null. Author: Wenchen Fan <wenchen@databricks.com> Closes #9770 from cloud-fan/udf. (cherry picked from commit 33b8373) Signed-off-by: Michael Armbrust <michael@databricks.com>
If user use primitive parameters in UDF, there is no way for him to do the null-check for primitive inputs, so we are assuming the primitive input is null-propagatable for this case and return null if the input is null.