[SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet #52947

cty123 · 2025-11-07T23:17:53Z

What changes were proposed in this pull request?

Spark connect has supported JDBC protocol with a few commonly used SQL data types. But currently it's missing the support for Decimal data which is also very commonly used to store money objects. I would like to have it support Decimal data type.

Why are the changes needed?

Right now, a user is able to read Decimal data from SQL by converting the data to string, and then parse the string into Java BigDecimal object. But since JDBC driver is already able to fetch the data as Java BigDecimal type, we can save the effort converting it back and forth. Instead, we just pass through the data we obtain from the raw JDBC result set.

Does this PR introduce any user-facing change?

It's part of a new feature under Spark connect JDBC support.

How was this patch tested?

I have created a test new unit test named 'get decimal type' and it covers my changes. Also the test case aligns with the tests for fetching other data types.

Was this patch authored or co-authored using generative AI tooling?

No

cty123 · 2025-11-08T01:04:16Z

@pan3793 I see you have been working on this story, perhaps it's better that you review it

pan3793 · 2025-11-08T18:15:39Z

...client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala

    case DoubleType => 24
    case StringType =>
      getPrecision(field)
+    case DecimalType.Fixed(p, s) => p + (if (s == 0) 0 else 1)


I think we should consider more cases, for example,

// scale = precision => only fractional digits. +3 for decimal point, sign and leading zero case DecimalType.Fixed(p, s) if s == p => p + 3 // scale = 0 => only integral part. +1 for sign case DecimalType.Fixed(p, s) if s == 0 => p + 1 // scale > 0 => both integral and fractional part. +2 for sign and decimal point case DecimalType.Fixed(p, s) => p + 2

Oh, that's a good point, I totally forgot the sign and the leading zero scenarios. I will probably also have some test cases to cover this.

Updated the logic for the edge cases, and added a few test cases to cover them.

...client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala

pan3793 · 2025-11-08T18:22:56Z

@cty123 thank you for working on this new feature.

…nnect/client/jdbc/util/JdbcTypeUtils.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>

pan3793 · 2025-11-08T20:52:04Z

...client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala

    case StringType =>
      getPrecision(field)
-    case DecimalType.Fixed(p, s) => p + (if (s == 0) 0 else 1)
+    // precision + sign(+/-) + leading zero + decimal point, like DECIMAL(5,5) = -0.12345


sign - display, but + does not

I updated the comment to make it more clear that the bit is for negative sign.

pan3793

LGTM, cc @LuciferYang, could you take a look?

LuciferYang · 2025-11-10T02:48:49Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

-  override def getBigDecimal(columnIndex: Int): java.math.BigDecimal =
-    throw new SQLFeatureNotSupportedException
+  override def getBigDecimal(columnIndex: Int): java.math.BigDecimal = {
+    if (currentRow.isNullAt(columnIndex - 1)) {


Should we first check checkOpen()?

Should we ensure that columnIndex is not out of bounds?

also cc @pan3793

make sense, but I think we can defer this to an independent patch since other get* methods have the same issue

I am experimenting this with other JDBC driver implementations.

Currently if it's out of the bound, it would throw out of bound exception,

[info] - get decimal type *** FAILED *** (78 milliseconds) [info] java.lang.ArrayIndexOutOfBoundsException: Index 998 out of bounds for length 1 [info] at org.apache.spark.sql.catalyst.expressions.GenericRow.get(rows.scala:37) [info] at org.apache.spark.sql.Row.isNullAt(Row.scala:216) [info] at org.apache.spark.sql.Row.isNullAt$(Row.scala:216) [info] at org.apache.spark.sql.catalyst.expressions.GenericRow.isNullAt(rows.scala:28)

Do we still need to check it? Or should we capture it and wrap it under another exception?

And yes, I think we can have a separate PR to address the 2 issues. And for 2, if we check the bound ourselves, I would do it inside isNullAt function, instead of calling another checker function inside each getter function.

handling this issue uniformly in a separate pr is fine for me

I just tested, you are right.

For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.

Right now when index goes out of bound, it throws java.lang.ArrayIndexOutOfBoundsException, but based on the specification on java.sql.ResultSet which is implemented by SparkConnectResultSet class, it should throw java.sql.SQLException

* @throws SQLException if the columnIndex is not valid;

Maybe we can create a separate jira to fix all the getter functions

@cty123 thanks for investigating it, I created SPARK-53484 for this, please go ahead to create a PR to fix this if you'd like to.

…ltSet ### What changes were proposed in this pull request? Spark connect has supported JDBC protocol with a few commonly used SQL data types. But currently it's missing the support for Decimal data which is also very commonly used to store money objects. I would like to have it support Decimal data type. ### Why are the changes needed? Right now, a user is able to read Decimal data from SQL by converting the data to string, and then parse the string into Java BigDecimal object. But since JDBC driver is already able to fetch the data as Java BigDecimal type, we can save the effort converting it back and forth. Instead, we just pass through the data we obtain from the raw JDBC result set. ### Does this PR introduce _any_ user-facing change? It's part of a new feature under Spark connect JDBC support. ### How was this patch tested? I have created a test new unit test named **'get decimal type'** and it covers my changes. Also the test case aligns with the tests for fetching other data types. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52947 from cty123/cty123/support-spark-connect-decimaltype. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 73b50e9) Signed-off-by: yangjie01 <yangjie01@baidu.com>

LuciferYang · 2025-11-10T04:44:22Z

Merged into master/branch-4.1. Thanks @cty123 and @pan3793

LuciferYang · 2025-11-10T04:44:43Z

@cty123 Could you please provide your Jira account name so that I can assign this ticket to you?

cty123 · 2025-11-10T04:52:46Z

Hey, I don't have a jira account, maybe you can assign to @pan3793?

pan3793 · 2025-11-10T05:13:29Z

@cty123, the spark community uses this to credit your contribution. you can apply a JIRA account, @LuciferYang should have permission for approval

dongjoon-hyun · 2025-11-10T05:30:33Z

Thank you, @cty123 and all!

… checkOpen and check index boundary ### What changes were proposed in this pull request? This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR #52947, >1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver. > >2. Right now when index goes out of bound, it throws `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException` > >``` > * throws SQLException if the columnIndex is not valid; >``` This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check. ### Why are the changes needed? Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition. ### Does this PR introduce _any_ user-facing change? This PR is a small fix related to a new feature introduced recently. ### How was this patch tested? I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52988 from cty123/cty123/address-spark-connect-getters. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

… checkOpen and check index boundary ### What changes were proposed in this pull request? This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR #52947, >1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver. > >2. Right now when index goes out of bound, it throws `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException` > >``` > * throws SQLException if the columnIndex is not valid; >``` This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check. ### Why are the changes needed? Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition. ### Does this PR introduce _any_ user-facing change? This PR is a small fix related to a new feature introduced recently. ### How was this patch tested? I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52988 from cty123/cty123/address-spark-connect-getters. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit c21d5a4) Signed-off-by: yangjie01 <yangjie01@baidu.com>

…ltSet ### What changes were proposed in this pull request? Spark connect has supported JDBC protocol with a few commonly used SQL data types. But currently it's missing the support for Decimal data which is also very commonly used to store money objects. I would like to have it support Decimal data type. ### Why are the changes needed? Right now, a user is able to read Decimal data from SQL by converting the data to string, and then parse the string into Java BigDecimal object. But since JDBC driver is already able to fetch the data as Java BigDecimal type, we can save the effort converting it back and forth. Instead, we just pass through the data we obtain from the raw JDBC result set. ### Does this PR introduce _any_ user-facing change? It's part of a new feature under Spark connect JDBC support. ### How was this patch tested? I have created a test new unit test named **'get decimal type'** and it covers my changes. Also the test case aligns with the tests for fetching other data types. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52947 from cty123/cty123/support-spark-connect-decimaltype. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

… checkOpen and check index boundary ### What changes were proposed in this pull request? This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR apache#52947, >1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver. > >2. Right now when index goes out of bound, it throws `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException` > >``` > * throws SQLException if the columnIndex is not valid; >``` This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check. ### Why are the changes needed? Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition. ### Does this PR introduce _any_ user-facing change? This PR is a small fix related to a new feature introduced recently. ### How was this patch tested? I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52988 from cty123/cty123/address-spark-connect-getters. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

Supports Decimal type data in SparkConnectResultSet.

9a9575b

github-actions bot added SQL CONNECT labels Nov 7, 2025

cty123 changed the title ~~[SPARK-53484][CONNECT] Supports Decimal type data in SparkConnectResultSet.~~ [SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet. Nov 8, 2025

pan3793 reviewed Nov 8, 2025

View reviewed changes

...client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala Outdated Show resolved Hide resolved

cty123 and others added 2 commits November 8, 2025 13:23

Update sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/co…

59105b4

…nnect/client/jdbc/util/JdbcTypeUtils.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>

update edge cases of the column display size calculation

f80ecf9

pan3793 reviewed Nov 8, 2025

View reviewed changes

pan3793 approved these changes Nov 8, 2025

View reviewed changes

cty123 added 2 commits November 8, 2025 17:07

polish code comments and drop an unsed variable.

f8b5bd6

small improvement on decimal type matching

39dbf00

LuciferYang reviewed Nov 10, 2025

View reviewed changes

LuciferYang approved these changes Nov 10, 2025

View reviewed changes

LuciferYang changed the title ~~[SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet.~~ [SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet Nov 10, 2025

LuciferYang closed this in 73b50e9 Nov 10, 2025

cty123 mentioned this pull request Nov 11, 2025

[SPARK-54270][CONNECT] SparkConnectResultSet get* methods should call checkOpen and check index boundary #52988

Closed

[SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet #52947

[SPARK-54205][CONNECT] Supports Decimal type data in SparkConnectResultSet #52947

Uh oh!

Conversation

cty123 commented Nov 7, 2025 • edited by LuciferYang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cty123 commented Nov 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cty123 Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cty123 Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pan3793 commented Nov 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cty123 Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Nov 10, 2025

Uh oh!

LuciferYang commented Nov 10, 2025

Uh oh!

cty123 commented Nov 10, 2025

Uh oh!

pan3793 commented Nov 10, 2025

Uh oh!

dongjoon-hyun commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cty123 commented Nov 7, 2025 •

edited by LuciferYang

Loading

cty123 Nov 8, 2025 •

edited

Loading

cty123 Nov 8, 2025 •

edited

Loading

cty123 Nov 10, 2025 •

edited

Loading