Skip to content

Conversation

@cty123
Copy link
Contributor

@cty123 cty123 commented Nov 7, 2025

What changes were proposed in this pull request?

Spark connect has supported JDBC protocol with a few commonly used SQL data types. But currently it's missing the support for Decimal data which is also very commonly used to store money objects. I would like to have it support Decimal data type.

Why are the changes needed?

Right now, a user is able to read Decimal data from SQL by converting the data to string, and then parse the string into Java BigDecimal object. But since JDBC driver is already able to fetch the data as Java BigDecimal type, we can save the effort converting it back and forth. Instead, we just pass through the data we obtain from the raw JDBC result set.

Does this PR introduce any user-facing change?

It's part of a new feature under Spark connect JDBC support.

How was this patch tested?

I have created a test new unit test named 'get decimal type' and it covers my changes. Also the test case aligns with the tests for fetching other data types.

Was this patch authored or co-authored using generative AI tooling?

No

@cty123
Copy link
Contributor Author

cty123 commented Nov 8, 2025

@pan3793 I see you have been working on this story, perhaps it's better that you review it

@cty123 cty123 changed the title [SPARK-53484][CONNECT] Supports Decimal type data in SparkConnectResultSet. [SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet. Nov 8, 2025
case DoubleType => 24
case StringType =>
getPrecision(field)
case DecimalType.Fixed(p, s) => p + (if (s == 0) 0 else 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider more cases, for example,

    // scale = precision => only fractional digits. +3 for decimal point, sign and leading zero
    case DecimalType.Fixed(p, s) if s == p => p + 3
    // scale = 0 => only integral part. +1 for sign
    case DecimalType.Fixed(p, s) if s == 0 => p + 1
    // scale > 0 => both integral and fractional part. +2 for sign and decimal point
    case DecimalType.Fixed(p, s) => p + 2

Copy link
Contributor Author

@cty123 cty123 Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good point, I totally forgot the sign and the leading zero scenarios. I will probably also have some test cases to cover this.

Copy link
Contributor Author

@cty123 cty123 Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the logic for the edge cases, and added a few test cases to cover them.

@pan3793
Copy link
Member

pan3793 commented Nov 8, 2025

@cty123 thank you for working on this new feature.

cty123 and others added 2 commits November 8, 2025 13:23
…nnect/client/jdbc/util/JdbcTypeUtils.scala

Co-authored-by: Cheng Pan <pan3793@gmail.com>
case StringType =>
getPrecision(field)
case DecimalType.Fixed(p, s) => p + (if (s == 0) 0 else 1)
// precision + sign(+/-) + leading zero + decimal point, like DECIMAL(5,5) = -0.12345
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sign - display, but + does not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to make it more clear that the bit is for negative sign.

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cc @LuciferYang, could you take a look?

override def getBigDecimal(columnIndex: Int): java.math.BigDecimal =
throw new SQLFeatureNotSupportedException
override def getBigDecimal(columnIndex: Int): java.math.BigDecimal = {
if (currentRow.isNullAt(columnIndex - 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we first check checkOpen()?
  2. Should we ensure that columnIndex is not out of bounds?

also cc @pan3793

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, but I think we can defer this to an independent patch since other get* methods have the same issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I am experimenting this with other JDBC driver implementations.

  2. Currently if it's out of the bound, it would throw out of bound exception,

[info] - get decimal type *** FAILED *** (78 milliseconds)
[info]   java.lang.ArrayIndexOutOfBoundsException: Index 998 out of bounds for length 1
[info]   at org.apache.spark.sql.catalyst.expressions.GenericRow.get(rows.scala:37)
[info]   at org.apache.spark.sql.Row.isNullAt(Row.scala:216)
[info]   at org.apache.spark.sql.Row.isNullAt$(Row.scala:216)
[info]   at org.apache.spark.sql.catalyst.expressions.GenericRow.isNullAt(rows.scala:28)

Do we still need to check it? Or should we capture it and wrap it under another exception?

And yes, I think we can have a separate PR to address the 2 issues. And for 2, if we check the bound ourselves, I would do it inside isNullAt function, instead of calling another checker function inside each getter function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handling this issue uniformly in a separate pr is fine for me

Copy link
Contributor Author

@cty123 cty123 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested, you are right.

  1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.

  2. Right now when index goes out of bound, it throws java.lang.ArrayIndexOutOfBoundsException, but based on the specification on java.sql.ResultSet which is implemented by SparkConnectResultSet class, it should throw java.sql.SQLException

     * @throws SQLException if the columnIndex is not valid;

Maybe we can create a separate jira to fix all the getter functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cty123 thanks for investigating it, I created SPARK-53484 for this, please go ahead to create a PR to fix this if you'd like to.

@LuciferYang LuciferYang changed the title [SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet. [SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet Nov 10, 2025
LuciferYang pushed a commit that referenced this pull request Nov 10, 2025
…ltSet

### What changes were proposed in this pull request?

Spark connect has supported JDBC protocol with a few commonly used SQL data types. But currently it's missing the support for Decimal data which is also very commonly used to store money objects. I would like to have it support Decimal data type.

### Why are the changes needed?

Right now, a user is able to read Decimal data from SQL by converting the data to string, and then parse the string into Java BigDecimal object. But since JDBC driver is already able to fetch the data as Java BigDecimal type, we can save the effort converting it back and forth. Instead, we just pass through the data we obtain from the raw JDBC result set.

### Does this PR introduce _any_ user-facing change?

It's part of a new feature under Spark connect JDBC support.

### How was this patch tested?

I have created a test new unit test named **'get decimal type'** and it covers my changes. Also the test case aligns with the tests for fetching other data types.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52947 from cty123/cty123/support-spark-connect-decimaltype.

Lead-authored-by: cty123 <ctychen2216@gmail.com>
Co-authored-by: cty <ctychen2216@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 73b50e9)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@LuciferYang
Copy link
Contributor

Merged into master/branch-4.1. Thanks @cty123 and @pan3793

@LuciferYang
Copy link
Contributor

@cty123 Could you please provide your Jira account name so that I can assign this ticket to you?

@cty123
Copy link
Contributor Author

cty123 commented Nov 10, 2025

Hey, I don't have a jira account, maybe you can assign to @pan3793?

@pan3793
Copy link
Member

pan3793 commented Nov 10, 2025

@cty123, the spark community uses this to credit your contribution. you can apply a JIRA account, @LuciferYang should have permission for approval

@dongjoon-hyun
Copy link
Member

Thank you, @cty123 and all!

LuciferYang pushed a commit that referenced this pull request Nov 12, 2025
… checkOpen and check index boundary

### What changes were proposed in this pull request?

This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR #52947,

>1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.
>
>2. Right now when index goes out of bound, it throws  `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException`
>
>```
>     * throws SQLException if the columnIndex is not valid;
>```

This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check.

### Why are the changes needed?

Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition.

### Does this PR introduce _any_ user-facing change?

This PR is a small fix related to a new feature introduced recently.

### How was this patch tested?

I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52988 from cty123/cty123/address-spark-connect-getters.

Lead-authored-by: cty123 <ctychen2216@gmail.com>
Co-authored-by: cty <ctychen2216@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
LuciferYang pushed a commit that referenced this pull request Nov 12, 2025
… checkOpen and check index boundary

### What changes were proposed in this pull request?

This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR #52947,

>1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.
>
>2. Right now when index goes out of bound, it throws  `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException`
>
>```
>     * throws SQLException if the columnIndex is not valid;
>```

This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check.

### Why are the changes needed?

Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition.

### Does this PR introduce _any_ user-facing change?

This PR is a small fix related to a new feature introduced recently.

### How was this patch tested?

I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52988 from cty123/cty123/address-spark-connect-getters.

Lead-authored-by: cty123 <ctychen2216@gmail.com>
Co-authored-by: cty <ctychen2216@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit c21d5a4)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
…ltSet

### What changes were proposed in this pull request?

Spark connect has supported JDBC protocol with a few commonly used SQL data types. But currently it's missing the support for Decimal data which is also very commonly used to store money objects. I would like to have it support Decimal data type.

### Why are the changes needed?

Right now, a user is able to read Decimal data from SQL by converting the data to string, and then parse the string into Java BigDecimal object. But since JDBC driver is already able to fetch the data as Java BigDecimal type, we can save the effort converting it back and forth. Instead, we just pass through the data we obtain from the raw JDBC result set.

### Does this PR introduce _any_ user-facing change?

It's part of a new feature under Spark connect JDBC support.

### How was this patch tested?

I have created a test new unit test named **'get decimal type'** and it covers my changes. Also the test case aligns with the tests for fetching other data types.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52947 from cty123/cty123/support-spark-connect-decimaltype.

Lead-authored-by: cty123 <ctychen2216@gmail.com>
Co-authored-by: cty <ctychen2216@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
… checkOpen and check index boundary

### What changes were proposed in this pull request?

This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR apache#52947,

>1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.
>
>2. Right now when index goes out of bound, it throws  `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException`
>
>```
>     * throws SQLException if the columnIndex is not valid;
>```

This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check.

### Why are the changes needed?

Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition.

### Does this PR introduce _any_ user-facing change?

This PR is a small fix related to a new feature introduced recently.

### How was this patch tested?

I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52988 from cty123/cty123/address-spark-connect-getters.

Lead-authored-by: cty123 <ctychen2216@gmail.com>
Co-authored-by: cty <ctychen2216@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants