Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented May 14, 2021

What changes were proposed in this pull request?

When creating Invoke and StaticInvoke for ScalarFunction's magic method, set propagateNull to false.

Why are the changes needed?

When propgagateNull is true (which is the default value), Invoke and StaticInvoke will return null if any of the argument is null. For scalar function this is incorrect, as we should leave the logic to function implementation instead.

Does this PR introduce any user-facing change?

Yes. Now null arguments shall be properly handled with magic method.

How was this patch tested?

Added new tests.

@github-actions github-actions bot added the SQL label May 14, 2021
@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43085/

@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43085/

@dongjoon-hyun
Copy link
Member

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented May 14, 2021

Test build #138564 has finished for PR 32553 at commit 8c4dfb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about primitive type parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it shouldn't matter. Here we're making sure that the magic method will always be invoked regardless of null-ness of the arguments.

For primitive types this has less significant meaning because even if progagateNull is true, needNullCheck

  protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable)

in InvokeLike also considers null-ness of arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can users handle nullable int values if the UDF is something like isPositive(int i) which can't accept null argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean, thanks. Seems we should allow users to define magic method with boxed primitive types for this case? We could also follow the behavior of ScalaUDF and returns null if any of the primitive type parameter is nullable and the input is null, however currently InvokeLike cannot handle the case where a subset of the input types are of primitive nullable type.

@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43169/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Test build #138649 has finished for PR 32553 at commit df6eec2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43170/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Test build #138650 has finished for PR 32553 at commit 6a87a27.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43171/

@SparkQA
Copy link

SparkQA commented May 18, 2021

Test build #138651 has finished for PR 32553 at commit 136867f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43177/

def propagateNull: Boolean

protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable)
def propagateNullForPrimitive: Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of the rule HandleNullInputsForUDF.

For primitive inputs, I don't think we have a choice and we must propagate null (isPositive(int i) can't handle null values). Can we detect it automatically instead of adding a new boolean flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let me remove the flag then, and add an extra comment to propagateNull.

One issue with this approach is that it only applies to the magic method path, while for produceResult users will need to handle the null primitive values explicitly. I think we'll need to document this more carefully, otherwise it could cause confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I agree. If users want to handle null by themselves, they should use boxed primitive types as the UDF parameter type.

@SparkQA
Copy link

SparkQA commented May 18, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43177/

@SparkQA
Copy link

SparkQA commented May 18, 2021

Test build #138655 has finished for PR 32553 at commit 94fd2d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case Some(m) if Modifier.isStatic(m.getModifiers) =>
StaticInvoke(scalarFunc.getClass, scalarFunc.resultType(),
MAGIC_METHOD_NAME, arguments, returnNullable = scalarFunc.isResultNullable)
MAGIC_METHOD_NAME, arguments, propagateNull = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we need propagateNull as true, instead of false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If propagateNull is true, we'd return null directly even if input arguments are of non-primitive type, which is not what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks.

protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable)
protected lazy val needNullCheck: Boolean = needNullCheckForIndex.contains(true)
protected lazy val needNullCheckForIndex: Array[Boolean] =
arguments.map(a => a.nullable && (propagateNull ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this change, I think the definition of propagateNull is somehow different now. Previously if propagateNull is true, null will be propagated, but now it also depends on if the argument is primitive type. It is better to update propagateNull param doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nvm. I saw you updated it below.

@SparkQA
Copy link

SparkQA commented May 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43187/

@SparkQA
Copy link

SparkQA commented May 18, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43187/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 44d762a May 18, 2021
@SparkQA
Copy link

SparkQA commented May 18, 2021

Test build #138666 has finished for PR 32553 at commit fb0d959.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sunchao
Copy link
Member Author

sunchao commented May 18, 2021

Thanks @cloud-fan and @viirya for the review!

@sunchao sunchao deleted the SPARK-35389 branch May 18, 2021 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants