Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 6, 2024

What changes were proposed in this pull request?

Replace NullIntolerant Mixin with Expression.nullIntolerant method

Why are the changes needed?

#48758 (comment) via @cloud-fan

This is not the first time that we are restricted by the trait-based tagging system. Extending a trait is static as it happens at compile but we really want to be more dynamic at runtime. For example, we added Expression#stateful and removed the trait StatefulExpression a while ago. I think the same thing happens to NullIntelerant now. Shall we also add Expression#nullIntolerant?

Does this PR introduce any user-facing change?

no

How was this patch tested?

Modified ExpressionInfoSuite,

  • Check whether SQL expressions should extend NullIntolerant

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 6, 2024
@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 6, 2024

cc @cloud-fan, thank you in advance

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @yaooqinn .

isDeterministic: Boolean = true,
scalarFunction: Option[ScalarFunction[_]] = None) extends InvokeLike {
scalarFunction: Option[ScalarFunction[_]] = None,
override val nullIntolerant: Boolean = false) extends InvokeLike {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is the same as propagateNull?

BTW I kind of like the name propagateNull more as it's more straightforward. But I'm not a native English speaker and I could be wrong. cc @dtenedor @srielau

Copy link
Member Author

@yaooqinn yaooqinn Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they are the same too. If you don't mind, I'd like too keep the PR in AS-IS state to minimize the potential influence of this refactoring work. Since nullIntolerant defaults to false,propagateNull defaults to true,mapping them blindly might result in many unsure plan changes,I‘d like to revist and revise them in followups, in the meanwhile I'm fixing the RuntimeReplaceables which use StaticInvoke and Invoke heavily? WDYT? @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but at lease we shouldn't add a new parameter here. We can just add

override def nullIntolerant = propagateNull

Copy link
Member Author

@yaooqinn yaooqinn Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullIntolerant and propagateNull currently have opposite defaults in where they are defined and called, I guess we shall leave them AS-IS and change them more carefully

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but at lease we shouldn't add a new parameter here. We can just add

override def nullIntolerant = propagateNull

I have tested this change in a separate push, the actions did show some test failures to me. I suggest we do that change separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean propagateNull and nullIntolerant are not exactly the same? I'd like to figure this out now, instead of adding a new parameter to StaticInvoke (not binary compatible) and then remove it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean that, I mean propagateNull lacks of some optimizations applied to the NullIntolerant trait

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's not even related to Invoke?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

case s: StaticInvoke
if s.staticObject == classOf[ByteArrayMethods] &&
Set("contains", "startsWith", "endsWith").contains(s.functionName) &&
s.arguments.length == 2 =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change?

val isEvalOverrode = clazz.getMethod("eval", classOf[InternalRow]) !=
superClass.getMethod("eval", classOf[InternalRow])
val isNullIntolerantMixedIn = classOf[NullIntolerant].isAssignableFrom(clazz)
val isNullIntolerantMixedIn = clazz.getMethod("nullIntolerant") !=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean overrideNullIntolerant?

checkAnswer(df5, Seq(Row("amy"), Row("cathy"), Row("alex"), Row("david"), Row("jen")))

val df6 = sql("SELECT name FROM h2.test.employee WHERE " +
"aes_decrypt(cast(null as binary), name) is null")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change the query to not use null input? We can use X'...' to write binary literal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it as-is because 1) we already have a similar one above and 2) the second parameter name get a chance to be validated while it's clearly not a valid input

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the second point makes me consider a potential breaking change: null intolerance takes higher precedence for error-raising in certain cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean aes_decrypt should not set propagateNull to true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a full check, but I believe some parameter checks may occur in the original expressions, while others might be checked in the replacement, which could result in different behaviors.

@github-actions github-actions bot added the INFRA label Nov 12, 2024
This reverts commit 6369f26.
@github-actions github-actions bot removed the INFRA label Nov 13, 2024
@yaooqinn yaooqinn closed this in a84ca5e Nov 13, 2024
@yaooqinn
Copy link
Member Author

Merged to master. Thank you for the review @dongjoon-hyun @cloud-fan

@yaooqinn yaooqinn deleted the SPARK-50241 branch November 13, 2024 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants