New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46187][SQL] Align codegen and non-codegen implementation of StringDecode
#44094
Conversation
StringDecode
StringDecode
@srielau @cloud-fan One more expression where need align codegen/non-codegen and restrict encodings. |
@@ -2648,18 +2648,26 @@ case class StringDecode(bin: Expression, charset: Expression) | |||
|
|||
protected override def nullSafeEval(input1: Any, input2: Any): Any = { | |||
val fromCharset = input2.asInstanceOf[UTF8String].toString | |||
UTF8String.fromString(new String(input1.asInstanceOf[Array[Byte]], fromCharset)) | |||
try { | |||
UTF8String.fromString(new String(input1.asInstanceOf[Array[Byte]], fromCharset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are here, can we rewrite this expression using StaticInvoke
? To avoid this problem completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same to Encode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we introduce the legacyCharsets
and supportedCharsets
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we introduce the legacyCharsets and supportedCharsets too?
Yep, let me merge this and introduce the restrictions + tests, update of the migration guide and so on.
Merging to master. Thank you, @cloud-fan and @beliefer for review. |
Late LGTM. |
…tringDecode` ### What changes were proposed in this pull request? In the PR, I propose to change the implementation of interpretation mode of `StringDecode` and apparently of the `decode` function. And make it consistent to codegen. Both implementation raise the same error with of the error class `INVALID_PARAMETER_VALUE.CHARSET`. ### Why are the changes needed? To make codegen and non-codegen of the `StringDecode` expression consistent. So, users will observe the same behaviour in both modes. ### Does this PR introduce _any_ user-facing change? Yes, if user code depends on error from `decode()`. ### How was this patch tested? By running the following test suites: ``` $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z string-functions.sql" $ build/sbt "core/testOnly *SparkThrowableSuite" $ build/sbt "test:testOnly *.StringFunctionsSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#44094 from MaxGekk/align-codegen-stringdecode. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
In the PR, I propose to change the implementation of interpretation mode of
StringDecode
and apparently of thedecode
function. And make it consistent to codegen. Both implementation raise the same error with of the error classINVALID_PARAMETER_VALUE.CHARSET
.Why are the changes needed?
To make codegen and non-codegen of the
StringDecode
expression consistent. So, users will observe the same behaviour in both modes.Does this PR introduce any user-facing change?
Yes, if user code depends on error from
decode()
.How was this patch tested?
By running the following test suites:
Was this patch authored or co-authored using generative AI tooling?
No.