Skip to content

Conversation

@cty123
Copy link
Contributor

@cty123 cty123 commented Nov 11, 2025

What changes were proposed in this pull request?

This PR aims to do a minor correction on the current get* functions from SparkConnectResultSet class. As previously discussed in the PR #52947,

  1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.

  2. Right now when index goes out of bound, it throws java.lang.ArrayIndexOutOfBoundsException, but based on the specification on java.sql.ResultSet which is implemented by SparkConnectResultSet class, it should throw java.sql.SQLException

    * @throws SQLException if the columnIndex is not valid;

This PR proposes a unified wrapper function called getColumnValue(columnIndex: Int) that aims to wrap the checkOpen as well as the index boundary check.

Why are the changes needed?

Currently the get* functions don't follow the expected behaviors of java.sql.ResultSet. It's technically not a big problem, but since the SparkConnectResultSet aims to implement the java.sql.ResultSet class, it should strictly follow the specification documented on the interface definition.

Does this PR introduce any user-facing change?

This PR is a small fix related to a new feature introduced recently.

How was this patch tested?

I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside SparkConnectResultSet class to make sure the correct exception(java.sql.SQLException) is thrown.

Was this patch authored or co-authored using generative AI tooling?

No

s"number of columns: ${currentRow.length}.")
}

Option(currentRow.get(columnIndex)) match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use currentRow.isNullAt(columnIndex) instead of currentRow.get(columnIndex) == null for NULL testing, the former is a contract and the latter may return an undefined value in certain implementations when isNullAt(columnIndex) is true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also why JDBC API has a wasNull method, otherwise, for method getBoolean, which returns a primitive type value, it's unable to distinguish between returning NULL or false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, I had something different in mind while I was implementing this. currentRow.get(columnIndex) was called twice for all get* functions(once in isNullAt, and once in the currentRow.get[Decimal|String|...] function), so I was trying to see if I can reduce it. But it's likely premature optimization. We can circle back when we have all the getter functions implemented.

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this, leave some comments

also cc @LuciferYang

withClue("SQLException is not thrown when the result set index goes out of bound") {
intercept[SQLException] {
getter(rs)
}
Copy link
Member

@pan3793 pan3793 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the returned message for SQLException

nit: I feel the logic here is simple and clear enough, withClue may not be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed withClue now, and assert on the message for SQLException.

Seq(
("'foo'", (rs: ResultSet) => rs.getString(1), "foo"),
("true", (rs: ResultSet) => rs.getBoolean(1), true),
("cast(1 as byte)", (rs: ResultSet) => rs.getByte(1), 1.toByte),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have SQL keyword (except for functions) UPPER_CASE? e.g. cast(1 AS BYTE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed them to upper case now

}
}

private[jdbc] def getField[T](columnIndex: Int)(get: Int => T): Option[T] = {
Copy link
Member

@pan3793 pan3793 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a hot path for a large query result set retrieval, so let's pass the default value as a parameter instead of returning Option to let the caller unwarp again

nit: "field" is usually used for object attribute, I feel getColumnValue or getVal is better here.

Suggested change
private[jdbc] def getField[T](columnIndex: Int)(get: Int => T): Option[T] = {
private[jdbc] def getVal[T](columnIndex: Int, defaultVal: Any)(getter: Int => T): T = {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the name getField either, I think perhaps getColumnValue is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let me check and experiment which way is better for passing the defaultValue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured a new of way of passing default value so that getColumnValue function doesn't need to return an option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it should return T instead of Option[T], it's a typo in my suggestion, sorry for making confusion

checkOpen()
if (columnIndex < 0 || columnIndex >= currentRow.length) {
throw new SQLException(s"The column index is out of range: $columnIndex, " +
s"number of columns: ${currentRow.length}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit:

In this method, columnIndex is 0-based. To avoid confusion, maybe we should consistently use 1-based columnIndex in the JDBC context.

Copy link
Contributor Author

@cty123 cty123 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, value in the error message should be the value the user has passed in. I copied Postgresql driver for the error message.

rs.getString(999)

Caused by: org.postgresql.util.PSQLException: The column index is out of range: 999, number of columns: 2.

@pan3793
Copy link
Member

pan3793 commented Nov 11, 2025

LGTM, except for the columnIndex in the SQLException message.

cty123 and others added 3 commits November 11, 2025 00:17
…nnect/client/jdbc/SparkConnectResultSet.scala

Co-authored-by: Cheng Pan <pan3793@gmail.com>
…nnect/client/jdbc/SparkConnectResultSet.scala

Co-authored-by: Cheng Pan <pan3793@gmail.com>
private[jdbc] def getColumnValue[T](index: Int, defaultVal: T)(getter: Int => T): T = {
checkOpen()
// the passed index value is 1-indexed, but the underlying array is 0-indexed
val columnIndex = index - 1
Copy link
Member

@pan3793 pan3793 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you swap the name between columnIndex and index? then, in the whole source file, columnIndex is 1-based consistently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pan3793
Copy link
Member

pan3793 commented Nov 11, 2025

@cty123 BTW, you can apply JIRA account at https://selfserve.apache.org/jira-account.html

}
}

private[jdbc] def getColumnValue[T](columnIndex: Int, defaultVal: T)(getter: Int => T): T = {
Copy link
Contributor

@LuciferYang LuciferYang Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function need to be visible in jdbc module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I put jdbc here because checkOpen is private[jdbc]. This function should really be used inside SparkConnectResultSet.scala file. I can remove this.

cty123 and others added 2 commits November 11, 2025 10:12
…nnect/client/jdbc/SparkConnectResultSet.scala

Co-authored-by: YangJie <yangjie01@baidu.com>
@cty123 cty123 force-pushed the cty123/address-spark-connect-getters branch from 27466c2 to eecc9df Compare November 11, 2025 15:15
LuciferYang pushed a commit that referenced this pull request Nov 12, 2025
… checkOpen and check index boundary

### What changes were proposed in this pull request?

This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR #52947,

>1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.
>
>2. Right now when index goes out of bound, it throws  `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException`
>
>```
>     * throws SQLException if the columnIndex is not valid;
>```

This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check.

### Why are the changes needed?

Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition.

### Does this PR introduce _any_ user-facing change?

This PR is a small fix related to a new feature introduced recently.

### How was this patch tested?

I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52988 from cty123/cty123/address-spark-connect-getters.

Lead-authored-by: cty123 <ctychen2216@gmail.com>
Co-authored-by: cty <ctychen2216@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit c21d5a4)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@LuciferYang
Copy link
Contributor

Merged into master and branch-4.1. Thanks @cty123 and @pan3793

zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
… checkOpen and check index boundary

### What changes were proposed in this pull request?

This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR apache#52947,

>1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver.
>
>2. Right now when index goes out of bound, it throws  `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException`
>
>```
>     * throws SQLException if the columnIndex is not valid;
>```

This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check.

### Why are the changes needed?

Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition.

### Does this PR introduce _any_ user-facing change?

This PR is a small fix related to a new feature introduced recently.

### How was this patch tested?

I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52988 from cty123/cty123/address-spark-connect-getters.

Lead-authored-by: cty123 <ctychen2216@gmail.com>
Co-authored-by: cty <ctychen2216@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants