-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-45887][SQL] Align codegen and non-codegen implementation of Encode
#43759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
EncodeEncode
|
@cloud-fan @srielau In the PR, I made the |
| }, | ||
| "CHARSET" : { | ||
| "message" : [ | ||
| "expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got <charset>." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give advice on the legacy configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to restrict the supported charsets in the code, and add a config for the legacy behaviour. In the following PR, I will modify the message and will add some advice.
| override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
| nullSafeCodeGen(ctx, ev, (string, charset) => | ||
| s""" | ||
| String toCharset = $charset.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is defined already.
[info] - SPARK-22543: split large if expressions into blocks due to JVM code size limit *** FAILED *** (59 milliseconds)
[info] java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 145, Column 8: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 145, Column 8: Redefinition of local variable "toCharset"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use val toCharset = ctx.freshName("toCharset")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the CI happy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one comment.
| override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
| nullSafeCodeGen(ctx, ev, (string, charset) => | ||
| s""" | ||
| String toCharset = $charset.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use val toCharset = ctx.freshName("toCharset")
|
Merging to master. Thank you, @dongjoon-hyun @srielau @cloud-fan @beliefer @HyukjinKwon for review. |
What changes were proposed in this pull request?
In the PR, I propose to change the implementation of interpretation mode, and make it consistent to codegen. Both implementation raise the same error with new error class
INVALID_PARAMETER_VALUE.CHARSET.Why are the changes needed?
To make codegen and non-codegen of the
Encodeexpression consistent. So, users will observe the same behaviour in both modes.Does this PR introduce any user-facing change?
Yes, if user code depends on error from
encode().How was this patch tested?
By running the following test suites:
Was this patch authored or co-authored using generative AI tooling?
No.