[SPARK-54552][CONNECT] Fix `SparkConnectResultSet.getString` to handle BINARY data type with `UTF_8` #53262

vinodkc · 2025-11-29T00:20:40Z

What changes were proposed in this pull request?

Fixed SparkConnectResultSet.getString() to properly convert BINARY data to UTF-8 strings instead of returning byte array object references (e.g., "[B@").

Why are the changes needed?

The current implementation violates JDBC specification behavior. Users calling getString() on BINARY columns expect UTF-8 decoded strings, not Java object references.

Before

SELECT binary('xDeAdBeEf')

spark-sql: `\xDeAdBeEf`
beeline with STS: `\xDeAdBeEf`
beeline with Connect Server: `[B@4d518c66`

After

SELECT binary('xDeAdBeEf')

spark-sql: `\xDeAdBeEf`
beeline with STS: `\xDeAdBeEf`
beeline with Connect Server: `\xDeAdBeEf`

Does this PR introduce any user-facing change?

Yes. getString() on BINARY columns now returns UTF-8 decoded strings instead of byte array references like "[B@1a2b3c4d".

How was this patch tested?

Added new test

Was this patch authored or co-authored using generative AI tooling?

No

vinodkc · 2025-11-29T00:21:54Z

CC @pan3793 , @dongjoon-hyun

dongjoon-hyun

+1, LGTM. Thank you, @vinodkc .

…e BINARY data type with `UTF_8` ### What changes were proposed in this pull request? Fixed `SparkConnectResultSet.getString()` to properly convert BINARY data to UTF-8 strings instead of returning byte array object references (e.g., "[B<hashcode>"). ### Why are the changes needed? The current implementation violates JDBC specification behavior. Users calling getString() on BINARY columns expect UTF-8 decoded strings, not Java object references. Before ``` SELECT binary('xDeAdBeEf') spark-sql: `\xDeAdBeEf` beeline with STS: `\xDeAdBeEf` beeline with Connect Server: `[B4d518c66` ``` After ``` SELECT binary('xDeAdBeEf') spark-sql: `\xDeAdBeEf` beeline with STS: `\xDeAdBeEf` beeline with Connect Server: `\xDeAdBeEf` ``` ### Does this PR introduce _any_ user-facing change? Yes. getString() on BINARY columns now returns UTF-8 decoded strings instead of byte array references like "[B1a2b3c4d". ### How was this patch tested? Added new test ### Was this patch authored or co-authored using generative AI tooling? No Closes #53262 from vinodkc/br_fix_getString_BINARY. Authored-by: vinodkc <vinod.kc.in@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit f5b9ea8) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2025-11-29T02:58:09Z

Merged to master/4.1 for Apache Spark 4.1.0.

Fix BINARY getString

5c32c65

github-actions bot added SQL CONNECT labels Nov 29, 2025

dongjoon-hyun changed the title ~~[SPARK-54552][CONNECT] Fix getString() method for BINARY data type in SparkConnectResultSet~~ [SPARK-54552][CONNECT] Fix SparkConnectResultSet.getString to handle BINARY data type with UTF_8 Nov 29, 2025

dongjoon-hyun approved these changes Nov 29, 2025

View reviewed changes

dongjoon-hyun closed this in f5b9ea8 Nov 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54552][CONNECT] Fix `SparkConnectResultSet.getString` to handle BINARY data type with `UTF_8` #53262

[SPARK-54552][CONNECT] Fix `SparkConnectResultSet.getString` to handle BINARY data type with `UTF_8` #53262

vinodkc commented Nov 29, 2025 •

edited

Loading

Uh oh!

vinodkc commented Nov 29, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54552][CONNECT] Fix SparkConnectResultSet.getString to handle BINARY data type with UTF_8 #53262

[SPARK-54552][CONNECT] Fix SparkConnectResultSet.getString to handle BINARY data type with UTF_8 #53262

Conversation

vinodkc commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

vinodkc commented Nov 29, 2025

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54552][CONNECT] Fix `SparkConnectResultSet.getString` to handle BINARY data type with `UTF_8` #53262

[SPARK-54552][CONNECT] Fix `SparkConnectResultSet.getString` to handle BINARY data type with `UTF_8` #53262

vinodkc commented Nov 29, 2025 •

edited

Loading