-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-50241][SQL] Replace NullIntolerant Mixin with Expression.nullIntolerant method #48772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan, thank you in advance |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @yaooqinn .
| isDeterministic: Boolean = true, | ||
| scalarFunction: Option[ScalarFunction[_]] = None) extends InvokeLike { | ||
| scalarFunction: Option[ScalarFunction[_]] = None, | ||
| override val nullIntolerant: Boolean = false) extends InvokeLike { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess they are the same too. If you don't mind, I'd like too keep the PR in AS-IS state to minimize the potential influence of this refactoring work. Since nullIntolerant defaults to false,propagateNull defaults to true,mapping them blindly might result in many unsure plan changes,I‘d like to revist and revise them in followups, in the meanwhile I'm fixing the RuntimeReplaceables which use StaticInvoke and Invoke heavily? WDYT? @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but at lease we shouldn't add a new parameter here. We can just add
override def nullIntolerant = propagateNull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullIntolerant and propagateNull currently have opposite defaults in where they are defined and called, I guess we shall leave them AS-IS and change them more carefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but at lease we shouldn't add a new parameter here. We can just add
override def nullIntolerant = propagateNull
I have tested this change in a separate push, the actions did show some test failures to me. I suggest we do that change separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean propagateNull and nullIntolerant are not exactly the same? I'd like to figure this out now, instead of adding a new parameter to StaticInvoke (not binary compatible) and then remove it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean that, I mean propagateNull lacks of some optimizations applied to the NullIntolerant trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's not even related to Invoke?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| case s: StaticInvoke | ||
| if s.staticObject == classOf[ByteArrayMethods] && | ||
| Set("contains", "startsWith", "endsWith").contains(s.functionName) && | ||
| s.arguments.length == 2 => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change?
| val isEvalOverrode = clazz.getMethod("eval", classOf[InternalRow]) != | ||
| superClass.getMethod("eval", classOf[InternalRow]) | ||
| val isNullIntolerantMixedIn = classOf[NullIntolerant].isAssignableFrom(clazz) | ||
| val isNullIntolerantMixedIn = clazz.getMethod("nullIntolerant") != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean overrideNullIntolerant?
| checkAnswer(df5, Seq(Row("amy"), Row("cathy"), Row("alex"), Row("david"), Row("jen"))) | ||
|
|
||
| val df6 = sql("SELECT name FROM h2.test.employee WHERE " + | ||
| "aes_decrypt(cast(null as binary), name) is null") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change the query to not use null input? We can use X'...' to write binary literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave it as-is because 1) we already have a similar one above and 2) the second parameter name get a chance to be validated while it's clearly not a valid input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the second point makes me consider a potential breaking change: null intolerance takes higher precedence for error-raising in certain cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean aes_decrypt should not set propagateNull to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a full check, but I believe some parameter checks may occur in the original expressions, while others might be checked in the replacement, which could result in different behaviors.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
Outdated
Show resolved
Hide resolved
This reverts commit 6369f26.
|
Merged to master. Thank you for the review @dongjoon-hyun @cloud-fan |
What changes were proposed in this pull request?
Replace NullIntolerant Mixin with Expression.nullIntolerant method
Why are the changes needed?
#48758 (comment) via @cloud-fan
Does this PR introduce any user-facing change?
no
How was this patch tested?
Modified ExpressionInfoSuite,
Was this patch authored or co-authored using generative AI tooling?
No