[SPARK-54270][CONNECT] SparkConnectResultSet get* methods should call checkOpen and check index boundary #52988

cty123 · 2025-11-11T01:41:14Z

What changes were proposed in this pull request?

This PR aims to do a minor correction on the current get* functions from SparkConnectResultSet class. As previously discussed in the PR #52947,

For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.

Right now when index goes out of bound, it throws java.lang.ArrayIndexOutOfBoundsException, but based on the specification on java.sql.ResultSet which is implemented by SparkConnectResultSet class, it should throw java.sql.SQLException
    * @throws SQLException if the columnIndex is not valid;

This PR proposes a unified wrapper function called getColumnValue(columnIndex: Int) that aims to wrap the checkOpen as well as the index boundary check.

Why are the changes needed?

Currently the get* functions don't follow the expected behaviors of java.sql.ResultSet. It's technically not a big problem, but since the SparkConnectResultSet aims to implement the java.sql.ResultSet class, it should strictly follow the specification documented on the interface definition.

Does this PR introduce any user-facing change?

This PR is a small fix related to a new feature introduced recently.

How was this patch tested?

I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside SparkConnectResultSet class to make sure the correct exception(java.sql.SQLException) is thrown.

Was this patch authored or co-authored using generative AI tooling?

No

pan3793 · 2025-11-11T02:27:08Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

+        s"number of columns: ${currentRow.length}.")
+    }
+
+    Option(currentRow.get(columnIndex)) match {


use currentRow.isNullAt(columnIndex) instead of currentRow.get(columnIndex) == null for NULL testing, the former is a contract and the latter may return an undefined value in certain implementations when isNullAt(columnIndex) is true

I think this is also why JDBC API has a wasNull method, otherwise, for method getBoolean, which returns a primitive type value, it's unable to distinguish between returning NULL or false

Yeah, I agree, I had something different in mind while I was implementing this. currentRow.get(columnIndex) was called twice for all get* functions(once in isNullAt, and once in the currentRow.get[Decimal|String|...] function), so I was trying to see if I can reduce it. But it's likely premature optimization. We can circle back when we have all the getter functions implemented.

pan3793

thanks for working on this, leave some comments

also cc @LuciferYang

pan3793 · 2025-11-11T03:15:38Z

.../src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala

+          withClue("SQLException is not thrown when the result set index goes out of bound") {
+            intercept[SQLException] {
+              getter(rs)
+            }


please check the returned message for SQLException

nit: I feel the logic here is simple and clear enough, withClue may not be required.

Removed withClue now, and assert on the message for SQLException.

.../src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala

pan3793 · 2025-11-11T03:17:06Z

.../src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala

+    Seq(
+      ("'foo'", (rs: ResultSet) => rs.getString(1), "foo"),
+      ("true", (rs: ResultSet) => rs.getBoolean(1), true),
+      ("cast(1 as byte)", (rs: ResultSet) => rs.getByte(1), 1.toByte),


Can we have SQL keyword (except for functions) UPPER_CASE? e.g. cast(1 AS BYTE)

Changed them to upper case now

pan3793 · 2025-11-11T03:19:52Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

    }
  }

+  private[jdbc] def getField[T](columnIndex: Int)(get: Int => T): Option[T] = {


this would be a hot path for a large query result set retrieval, so let's pass the default value as a parameter instead of returning Option to let the caller unwarp again

nit: "field" is usually used for object attribute, I feel getColumnValue or getVal is better here.

Suggested change

private[jdbc] def getField[T](columnIndex: Int)(get: Int => T): Option[T] = {

private[jdbc] def getVal[T](columnIndex: Int, defaultVal: Any)(getter: Int => T): T = {

I don't like the name getField either, I think perhaps getColumnValue is better.

Also let me check and experiment which way is better for passing the defaultValue

Figured a new of way of passing default value so that getColumnValue function doesn't need to return an option.

yea, it should return T instead of Option[T], it's a typo in my suggestion, sorry for making confusion

…nnect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>

…de style.

pan3793 · 2025-11-11T04:54:32Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

+    checkOpen()
+    if (columnIndex < 0 || columnIndex >= currentRow.length) {
+      throw new SQLException(s"The column index is out of range: $columnIndex, " +
+        s"number of columns: ${currentRow.length}.")


small nit:

In this method, columnIndex is 0-based. To avoid confusion, maybe we should consistently use 1-based columnIndex in the JDBC context.

Oh, yes, value in the error message should be the value the user has passed in. I copied Postgresql driver for the error message.

rs.getString(999)

Caused by: org.postgresql.util.PSQLException: The column index is out of range: 999, number of columns: 2.

pan3793 · 2025-11-11T04:56:25Z

LGTM, except for the columnIndex in the SQLException message.

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>

pan3793 · 2025-11-11T05:40:23Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

+  private[jdbc] def getColumnValue[T](index: Int, defaultVal: T)(getter: Int => T): T = {
+    checkOpen()
+    // the passed index value is 1-indexed, but the underlying array is 0-indexed
+    val columnIndex = index - 1


can you swap the name between columnIndex and index? then, in the whole source file, columnIndex is 1-based consistently

pan3793 · 2025-11-11T06:38:12Z

@cty123 BTW, you can apply JIRA account at https://selfserve.apache.org/jira-account.html

LuciferYang · 2025-11-11T06:40:14Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

    }
  }

+  private[jdbc] def getColumnValue[T](columnIndex: Int, defaultVal: T)(getter: Int => T): T = {


Does this function need to be visible in jdbc module?

No, I put jdbc here because checkOpen is private[jdbc]. This function should really be used inside SparkConnectResultSet.scala file. I can remove this.

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala

…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: YangJie <yangjie01@baidu.com>

… checkOpen and check index boundary ### What changes were proposed in this pull request? This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR #52947, >1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver. > >2. Right now when index goes out of bound, it throws `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException` > >``` > * throws SQLException if the columnIndex is not valid; >``` This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check. ### Why are the changes needed? Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition. ### Does this PR introduce _any_ user-facing change? This PR is a small fix related to a new feature introduced recently. ### How was this patch tested? I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52988 from cty123/cty123/address-spark-connect-getters. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit c21d5a4) Signed-off-by: yangjie01 <yangjie01@baidu.com>

LuciferYang · 2025-11-12T02:34:53Z

Merged into master and branch-4.1. Thanks @cty123 and @pan3793

… checkOpen and check index boundary ### What changes were proposed in this pull request? This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR apache#52947, >1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver. > >2. Right now when index goes out of bound, it throws `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException` > >``` > * throws SQLException if the columnIndex is not valid; >``` This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check. ### Why are the changes needed? Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition. ### Does this PR introduce _any_ user-facing change? This PR is a small fix related to a new feature introduced recently. ### How was this patch tested? I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52988 from cty123/cty123/address-spark-connect-getters. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

address the input validation of getter functions.

fb9951d

github-actions bot added SQL CONNECT labels Nov 11, 2025

reduce an unused variable

96926ff

pan3793 reviewed Nov 11, 2025

View reviewed changes

use isNullAt to check for null.

758eadf

pan3793 reviewed Nov 11, 2025

View reviewed changes

cty123 and others added 3 commits November 10, 2025 22:47

Update sql/connect/client/jdbc/src/test/scala/org/apache/spark/sql/co…

f068c80

…nnect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>

rename getField function to getColumnValue and small fixes for co…

05f59d3

…de style.

optimize default value passing for getColumnValue function.

3b506c9

pan3793 reviewed Nov 11, 2025

View reviewed changes

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala Outdated Show resolved Hide resolved

pan3793 reviewed Nov 11, 2025

View reviewed changes

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala Outdated Show resolved Hide resolved

cty123 and others added 3 commits November 11, 2025 00:17

Update sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/co…

22edc57

…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>

Update sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/co…

f2e41c2

…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>

address the index passed to the getColumnValue function.

bcfa878

pan3793 reviewed Nov 11, 2025

View reviewed changes

rename column index variable

ed8f0c0

pan3793 approved these changes Nov 11, 2025

View reviewed changes

LuciferYang reviewed Nov 11, 2025

View reviewed changes

cty123 and others added 2 commits November 11, 2025 10:12

Update sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/co…

3ba3f64

…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: YangJie <yangjie01@baidu.com>

remove jdbc in private function

eecc9df

cty123 force-pushed the cty123/address-spark-connect-getters branch from 27466c2 to eecc9df Compare November 11, 2025 15:15

LuciferYang approved these changes Nov 12, 2025

View reviewed changes

LuciferYang closed this in c21d5a4 Nov 12, 2025

	private[jdbc] def getField[T](columnIndex: Int)(get: Int => T): Option[T] = {
	private[jdbc] def getVal[T](columnIndex: Int, defaultVal: Any)(getter: Int => T): T = {

[SPARK-54270][CONNECT] SparkConnectResultSet get* methods should call checkOpen and check index boundary #52988

[SPARK-54270][CONNECT] SparkConnectResultSet get* methods should call checkOpen and check index boundary #52988

Uh oh!

Conversation

cty123 commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 left a comment

Choose a reason for hiding this comment

Uh oh!

pan3793 Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cty123 Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Nov 11, 2025

Uh oh!

Uh oh!

Uh oh!

pan3793 Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Nov 11, 2025

Uh oh!

LuciferYang Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LuciferYang commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cty123 commented Nov 11, 2025 •

edited

Loading

pan3793 Nov 11, 2025 •

edited

Loading

pan3793 Nov 11, 2025 •

edited

Loading

cty123 Nov 11, 2025 •

edited

Loading

pan3793 Nov 11, 2025 •

edited

Loading

LuciferYang Nov 11, 2025 •

edited

Loading