New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37047][SQL][FOLLOWUP] Add legacy flag for the breaking change of lpad and rpad for binary type #36103
Conversation
Can one of the admins verify this patch? |
Hi @mkaravel @cloud-fan , would you take a look? Thank you! |
@@ -3709,6 +3709,17 @@ object SQLConf { | |||
.booleanConf | |||
.createWithDefault(true) | |||
|
|||
val LPAD_RPAD_FOR_BINARY_TYPE = | |||
buildConf("spark.sql.legacy.lpadRpadForBinaryType.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the config name sounds like it goes to the legacy behavior for binary input if it's set it true. Maybe we should use false
as its default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to
val LEGACY_LPAD_RPAD_BINARY_TYPE_AS_STRING =
buildConf("spark.sql.legacy.lpadRpadBinaryTypeAsString")
with default value false
. Sounds better?
@@ -2776,7 +2776,11 @@ object functions { | |||
* @since 3.3.0 | |||
*/ | |||
def lpad(str: Column, len: Int, pad: Array[Byte]): Column = withExpr { | |||
BinaryPad("lpad", str.expr, lit(len).expr, lit(pad).expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just create UnresolvedFunction
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean when the legacy is restored, and pad
is binary? Why is it a UnresolvedFunction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using UnresolveFunction
, we will go through the function look up code path and your change in PadExpressionBuilderBase
will kick in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks, never know that.
@@ -108,6 +111,46 @@ SELECT lpad(x'57', 5, 'abc'); | |||
SELECT rpad('abc', 5, x'57'); | |||
SELECT rpad(x'57', 5, 'abc'); | |||
|
|||
-- Dsiable the lpad rpad binary breaking change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably an overkill to test the flag like this... I think the change in StringExpressionsSuite
is good enough
Seq(Literal("hi"), Literal(5), Literal("somepadding"))) | ||
|
||
if (SQLConf.get.getConf(SQLConf.LPAD_RPAD_FOR_BINARY_TYPE)) { | ||
assert(lpadExp1.asInstanceOf[BinaryPad] == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we don't need .asInstanceOf[BinaryPad]
assert(lpadExp1.asInstanceOf[BinaryPad] == | ||
BinaryPad("lpad", Literal(Array[Byte]()), Literal(5), Literal(Array[Byte](0)))) | ||
assert(lpadExp2.asInstanceOf[StringLPad] == | ||
StringLPad(Literal("hi"), Literal(5), Literal(" "))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put the ones that have the same result before the if-else
? to highlight what gets changed if we flip the config.
@@ -3709,6 +3709,18 @@ object SQLConf { | |||
.booleanConf | |||
.createWithDefault(true) | |||
|
|||
val LEGACY_LPAD_RPAD_BINARY_TYPE_AS_STRING = | |||
buildConf("spark.sql.legacy.lpadRpadBinaryTypeAsString") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about spark.sql.legacy.lpadRpadAlwaysReturnString
? If true, it's the legacy behavior and the pad function always return string even for binary inputs. If false, the function returns binary if the first argument and the padding pattern parameter are both binary type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! This sounds better. Naming is hard ..
@@ -74,6 +74,9 @@ SELECT btrim(encode('xxxbarxxx', 'utf-8'), encode('x', 'utf-8')); | |||
SELECT lpad('hi', 'invalid_length'); | |||
SELECT rpad('hi', 'invalid_length'); | |||
|
|||
-- Enable the lpad rpad function for binary input type | |||
SET spark.sql.legacy.lpadRpadAlwaysReturnString=false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to change this sql file
The test job running: https://github.com/anchovYu/spark/actions/runs/2156689301 |
LGTM, let's update the migration guide to mention this conf |
thanks, merging to master/3.3 (last commit only changes migration guide) |
…of lpad and rpad for binary type ### What changes were proposed in this pull request? Add a legacy flag `spark.sql.legacy.lpadRpadForBinaryType.enabled` for the breaking change introduced in #34154. The flag is enabled by default. When it is disabled, restore the pre-change behavior that there is no special handling on `BINARY` input types. ### Why are the changes needed? The original commit is a breaking change, and breaking changes should be encouraged to add a flag to turn it off for smooth migration between versions. ### Does this PR introduce _any_ user-facing change? With the default value of the conf, there is no user-facing difference. If users turn this conf off, they can restore the pre-change behavior. ### How was this patch tested? Through unit tests. Closes #36103 from anchovYu/flags-lpad-rpad-binary. Authored-by: Xinyi Yu <xinyi.yu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e2683c2) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Add a legacy flag
spark.sql.legacy.lpadRpadAlwaysReturnString
for the breaking change introduced in #34154.The flag is disabled by default. When it is enabled, restore the pre-change behavior that there is no special handling on
BINARY
input types.Why are the changes needed?
The original commit is a breaking change, and breaking changes should be encouraged to add a flag to turn it off for smooth migration between versions.
Does this PR introduce any user-facing change?
With the default value of the conf, there is no user-facing difference.
If users turn this conf on, they can restore the pre-change behavior.
How was this patch tested?
Through unit tests.