Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 10, 2023

What changes were proposed in this pull request?

In the PR, I propose to change the implementation of interpretation mode, and make it consistent to codegen. Both implementation raise the same error with new error class INVALID_PARAMETER_VALUE.CHARSET.

Why are the changes needed?

To make codegen and non-codegen of the Encode expression consistent. So, users will observe the same behaviour in both modes.

Does this PR introduce any user-facing change?

Yes, if user code depends on error from encode().

How was this patch tested?

By running the following test suites:

$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z string-functions.sql"
$ build/sbt "core/testOnly *SparkThrowableSuite"
$ build/sbt "test:testOnly *.StringFunctionsSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

@MaxGekk MaxGekk changed the title [WIP][SQL] Align codegen and non-codegen implementation of Encode [SPARK-45887][SQL] Align codegen and non-codegen implementation of Encode Nov 10, 2023
@MaxGekk MaxGekk marked this pull request as ready for review November 10, 2023 14:24
@MaxGekk MaxGekk requested a review from cloud-fan November 10, 2023 14:25
@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 10, 2023

@cloud-fan @srielau In the PR, I made the Encode implementation consistent. Could you review this PR, please.

},
"CHARSET" : {
"message" : [
"expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got <charset>."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give advice on the legacy configuration?

Copy link
Member Author

@MaxGekk MaxGekk Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to restrict the supported charsets in the code, and add a config for the legacy behaviour. In the following PR, I will modify the message and will add some advice.

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
nullSafeCodeGen(ctx, ev, (string, charset) =>
s"""
String toCharset = $charset.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is defined already.

[info] - SPARK-22543: split large if expressions into blocks due to JVM code size limit *** FAILED *** (59 milliseconds)
[info]   java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 145, Column 8: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 145, Column 8: Redefinition of local variable "toCharset"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use val toCharset = ctx.freshName("toCharset")

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the CI happy?

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment.

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
nullSafeCodeGen(ctx, ev, (string, charset) =>
s"""
String toCharset = $charset.toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use val toCharset = ctx.freshName("toCharset")

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 23, 2023

Merging to master. Thank you, @dongjoon-hyun @srielau @cloud-fan @beliefer @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in e470e74 Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants