[SPARK-45887][SQL] Align codegen and non-codegen implementation of `Encode` #43759

MaxGekk · 2023-11-10T14:23:05Z

What changes were proposed in this pull request?

In the PR, I propose to change the implementation of interpretation mode, and make it consistent to codegen. Both implementation raise the same error with new error class INVALID_PARAMETER_VALUE.CHARSET.

Why are the changes needed?

To make codegen and non-codegen of the Encode expression consistent. So, users will observe the same behaviour in both modes.

Does this PR introduce any user-facing change?

Yes, if user code depends on error from encode().

How was this patch tested?

By running the following test suites:

$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z string-functions.sql"
$ build/sbt "core/testOnly *SparkThrowableSuite"
$ build/sbt "test:testOnly *.StringFunctionsSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

MaxGekk · 2023-11-10T14:26:41Z

@cloud-fan @srielau In the PR, I made the Encode implementation consistent. Could you review this PR, please.

srielau · 2023-11-10T15:40:38Z

common/utils/src/main/resources/error/error-classes.json

      },
+      "CHARSET" : {
+        "message" : [
+          "expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got <charset>."


Should we give advice on the legacy configuration?

I plan to restrict the supported charsets in the code, and add a config for the legacy behaviour. In the following PR, I will modify the message and will add some advice.

dongjoon-hyun · 2023-11-12T23:00:03Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    nullSafeCodeGen(ctx, ev, (string, charset) =>
      s"""
+        String toCharset = $charset.toString();


It seems this is defined already.

[info] - SPARK-22543: split large if expressions into blocks due to JVM code size limit *** FAILED *** (59 milliseconds) [info] java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 145, Column 8: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 145, Column 8: Redefinition of local variable "toCharset"

Please use val toCharset = ctx.freshName("toCharset")

dongjoon-hyun

Could you make the CI happy?

beliefer

LGTM except one comment.

beliefer · 2023-11-13T03:27:36Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    nullSafeCodeGen(ctx, ev, (string, charset) =>
      s"""
+        String toCharset = $charset.toString();


Please use val toCharset = ctx.freshName("toCharset")

…n-encode

MaxGekk · 2023-11-23T15:32:45Z

Merging to master. Thank you, @dongjoon-hyun @srielau @cloud-fan @beliefer @HyukjinKwon for review.

MaxGekk added 3 commits November 10, 2023 16:22

Add an error class

c96f71e

Re-gen sql-error-conditions-invalid-parameter-value-error-class.md

1179763

Add tests and re-gen golden files

40aa7b0

github-actions bot added SQL DOCS labels Nov 10, 2023

MaxGekk changed the title ~~[WIP][SQL] Align codegen and non-codegen implementation of Encode~~ [SPARK-45887][SQL] Align codegen and non-codegen implementation of Encode Nov 10, 2023

MaxGekk marked this pull request as ready for review November 10, 2023 14:24

Trigger build

79fc708

MaxGekk requested a review from cloud-fan November 10, 2023 14:25

srielau approved these changes Nov 10, 2023

View reviewed changes

HyukjinKwon approved these changes Nov 12, 2023

View reviewed changes

cloud-fan approved these changes Nov 12, 2023

View reviewed changes

dongjoon-hyun reviewed Nov 12, 2023

View reviewed changes

beliefer approved these changes Nov 13, 2023

View reviewed changes

MaxGekk added 2 commits November 23, 2023 14:19

Merge remote-tracking branch 'origin/master' into restrict-charsets-i…

f1c9425

…n-encode

Fix a test

ac5f535

MaxGekk closed this in e470e74 Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-45887][SQL] Align codegen and non-codegen implementation of `Encode` #43759

[SPARK-45887][SQL] Align codegen and non-codegen implementation of `Encode` #43759

Uh oh!

MaxGekk commented Nov 10, 2023

Uh oh!

MaxGekk commented Nov 10, 2023

Uh oh!

srielau Nov 10, 2023

Uh oh!

MaxGekk Nov 10, 2023 •

edited

Loading

Uh oh!

dongjoon-hyun Nov 12, 2023

Uh oh!

beliefer Nov 13, 2023

Uh oh!

dongjoon-hyun left a comment

Uh oh!

beliefer left a comment

Uh oh!

beliefer Nov 13, 2023

Uh oh!

MaxGekk commented Nov 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-45887][SQL] Align codegen and non-codegen implementation of Encode #43759

[SPARK-45887][SQL] Align codegen and non-codegen implementation of Encode #43759

Uh oh!

Conversation

MaxGekk commented Nov 10, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Nov 10, 2023

Uh oh!

srielau Nov 10, 2023

Choose a reason for hiding this comment

Uh oh!

MaxGekk Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 12, 2023

Choose a reason for hiding this comment

Uh oh!

beliefer Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

beliefer left a comment

Choose a reason for hiding this comment

Uh oh!

beliefer Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Nov 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-45887][SQL] Align codegen and non-codegen implementation of `Encode` #43759

[SPARK-45887][SQL] Align codegen and non-codegen implementation of `Encode` #43759

MaxGekk Nov 10, 2023 •

edited

Loading