-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35389][SQL] V2 ScalarFunction should support magic method with null arguments #32553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
cc @cloud-fan |
|
Test build #138564 has finished for PR 32553 at commit
|
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about primitive type parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it shouldn't matter. Here we're making sure that the magic method will always be invoked regardless of null-ness of the arguments.
For primitive types this has less significant meaning because even if progagateNull is true, needNullCheck
protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable)
in InvokeLike also considers null-ness of arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can users handle nullable int values if the UDF is something like isPositive(int i) which can't accept null argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean, thanks. Seems we should allow users to define magic method with boxed primitive types for this case? We could also follow the behavior of ScalaUDF and returns null if any of the primitive type parameter is nullable and the input is null, however currently InvokeLike cannot handle the case where a subset of the input types are of primitive nullable type.
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138649 has finished for PR 32553 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138650 has finished for PR 32553 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138651 has finished for PR 32553 at commit
|
|
Kubernetes integration test starting |
| def propagateNull: Boolean | ||
|
|
||
| protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable) | ||
| def propagateNullForPrimitive: Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of the rule HandleNullInputsForUDF.
For primitive inputs, I don't think we have a choice and we must propagate null (isPositive(int i) can't handle null values). Can we detect it automatically instead of adding a new boolean flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let me remove the flag then, and add an extra comment to propagateNull.
One issue with this approach is that it only applies to the magic method path, while for produceResult users will need to handle the null primitive values explicitly. I think we'll need to document this more carefully, otherwise it could cause confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I agree. If users want to handle null by themselves, they should use boxed primitive types as the UDF parameter type.
|
Kubernetes integration test status success |
|
Test build #138655 has finished for PR 32553 at commit
|
| case Some(m) if Modifier.isStatic(m.getModifiers) => | ||
| StaticInvoke(scalarFunc.getClass, scalarFunc.resultType(), | ||
| MAGIC_METHOD_NAME, arguments, returnNullable = scalarFunc.isResultNullable) | ||
| MAGIC_METHOD_NAME, arguments, propagateNull = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we need propagateNull as true, instead of false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If propagateNull is true, we'd return null directly even if input arguments are of non-primitive type, which is not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks.
| protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable) | ||
| protected lazy val needNullCheck: Boolean = needNullCheckForIndex.contains(true) | ||
| protected lazy val needNullCheckForIndex: Array[Boolean] = | ||
| arguments.map(a => a.nullable && (propagateNull || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this change, I think the definition of propagateNull is somehow different now. Previously if propagateNull is true, null will be propagated, but now it also depends on if the argument is primitive type. It is better to update propagateNull param doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nvm. I saw you updated it below.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
thanks, merging to master! |
|
Test build #138666 has finished for PR 32553 at commit
|
|
Thanks @cloud-fan and @viirya for the review! |
What changes were proposed in this pull request?
When creating
InvokeandStaticInvokeforScalarFunction's magic method, setpropagateNullto false.Why are the changes needed?
When
propgagateNullis true (which is the default value),InvokeandStaticInvokewill return null if any of the argument is null. For scalar function this is incorrect, as we should leave the logic to function implementation instead.Does this PR introduce any user-facing change?
Yes. Now null arguments shall be properly handled with magic method.
How was this patch tested?
Added new tests.