Skip to content

Conversation

@vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Nov 29, 2025

What changes were proposed in this pull request?

Fixed SparkConnectResultSet.getString() to properly convert BINARY data to UTF-8 strings instead of returning byte array object references (e.g., "[B@").

Why are the changes needed?

The current implementation violates JDBC specification behavior. Users calling getString() on BINARY columns expect UTF-8 decoded strings, not Java object references.

Before

SELECT binary('xDeAdBeEf')

spark-sql: `\xDeAdBeEf`
beeline with STS: `\xDeAdBeEf`
beeline with Connect Server: `[B@4d518c66`

After

SELECT binary('xDeAdBeEf')

spark-sql: `\xDeAdBeEf`
beeline with STS: `\xDeAdBeEf`
beeline with Connect Server: `\xDeAdBeEf`

Does this PR introduce any user-facing change?

Yes. getString() on BINARY columns now returns UTF-8 decoded strings instead of byte array references like "[B@1a2b3c4d".

How was this patch tested?

Added new test

Was this patch authored or co-authored using generative AI tooling?

No

@vinodkc
Copy link
Contributor Author

vinodkc commented Nov 29, 2025

CC @pan3793 , @dongjoon-hyun

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54552][CONNECT] Fix getString() method for BINARY data type in SparkConnectResultSet [SPARK-54552][CONNECT] Fix SparkConnectResultSet.getString to handle BINARY data type with UTF_8 Nov 29, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @vinodkc .

dongjoon-hyun pushed a commit that referenced this pull request Nov 29, 2025
…e BINARY data type with `UTF_8`

### What changes were proposed in this pull request?

Fixed `SparkConnectResultSet.getString()` to properly convert BINARY data to UTF-8 strings instead of returning byte array object references (e.g., "[B<hashcode>").

### Why are the changes needed?

The current implementation violates JDBC specification behavior. Users calling getString() on BINARY columns expect UTF-8 decoded strings, not Java object references.

Before
```
SELECT binary('xDeAdBeEf')

spark-sql: `\xDeAdBeEf`
beeline with STS: `\xDeAdBeEf`
beeline with Connect Server: `[B4d518c66`
```

After
```
SELECT binary('xDeAdBeEf')

spark-sql: `\xDeAdBeEf`
beeline with STS: `\xDeAdBeEf`
beeline with Connect Server: `\xDeAdBeEf`
```

### Does this PR introduce _any_ user-facing change?

Yes. getString() on BINARY columns now returns UTF-8 decoded strings instead of byte array references like "[B1a2b3c4d".

### How was this patch tested?

Added new test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #53262 from vinodkc/br_fix_getString_BINARY.

Authored-by: vinodkc <vinod.kc.in@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit f5b9ea8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to master/4.1 for Apache Spark 4.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants