-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54270][CONNECT] SparkConnectResultSet get* methods should call checkOpen and check index boundary #52988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-54270][CONNECT] SparkConnectResultSet get* methods should call checkOpen and check index boundary #52988
Conversation
| s"number of columns: ${currentRow.length}.") | ||
| } | ||
|
|
||
| Option(currentRow.get(columnIndex)) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use currentRow.isNullAt(columnIndex) instead of currentRow.get(columnIndex) == null for NULL testing, the former is a contract and the latter may return an undefined value in certain implementations when isNullAt(columnIndex) is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also why JDBC API has a wasNull method, otherwise, for method getBoolean, which returns a primitive type value, it's unable to distinguish between returning NULL or false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree, I had something different in mind while I was implementing this. currentRow.get(columnIndex) was called twice for all get* functions(once in isNullAt, and once in the currentRow.get[Decimal|String|...] function), so I was trying to see if I can reduce it. But it's likely premature optimization. We can circle back when we have all the getter functions implemented.
pan3793
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this, leave some comments
also cc @LuciferYang
| withClue("SQLException is not thrown when the result set index goes out of bound") { | ||
| intercept[SQLException] { | ||
| getter(rs) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check the returned message for SQLException
nit: I feel the logic here is simple and clear enough, withClue may not be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed withClue now, and assert on the message for SQLException.
.../src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala
Outdated
Show resolved
Hide resolved
| Seq( | ||
| ("'foo'", (rs: ResultSet) => rs.getString(1), "foo"), | ||
| ("true", (rs: ResultSet) => rs.getBoolean(1), true), | ||
| ("cast(1 as byte)", (rs: ResultSet) => rs.getByte(1), 1.toByte), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have SQL keyword (except for functions) UPPER_CASE? e.g. cast(1 AS BYTE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed them to upper case now
| } | ||
| } | ||
|
|
||
| private[jdbc] def getField[T](columnIndex: Int)(get: Int => T): Option[T] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a hot path for a large query result set retrieval, so let's pass the default value as a parameter instead of returning Option to let the caller unwarp again
nit: "field" is usually used for object attribute, I feel getColumnValue or getVal is better here.
| private[jdbc] def getField[T](columnIndex: Int)(get: Int => T): Option[T] = { | |
| private[jdbc] def getVal[T](columnIndex: Int, defaultVal: Any)(getter: Int => T): T = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the name getField either, I think perhaps getColumnValue is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let me check and experiment which way is better for passing the defaultValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured a new of way of passing default value so that getColumnValue function doesn't need to return an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, it should return T instead of Option[T], it's a typo in my suggestion, sorry for making confusion
…nnect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>
| checkOpen() | ||
| if (columnIndex < 0 || columnIndex >= currentRow.length) { | ||
| throw new SQLException(s"The column index is out of range: $columnIndex, " + | ||
| s"number of columns: ${currentRow.length}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit:
In this method, columnIndex is 0-based. To avoid confusion, maybe we should consistently use 1-based columnIndex in the JDBC context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, value in the error message should be the value the user has passed in. I copied Postgresql driver for the error message.
rs.getString(999)Caused by: org.postgresql.util.PSQLException: The column index is out of range: 999, number of columns: 2.
|
LGTM, except for the columnIndex in the SQLException message. |
...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala
Outdated
Show resolved
Hide resolved
...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala
Outdated
Show resolved
Hide resolved
…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>
…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: Cheng Pan <pan3793@gmail.com>
| private[jdbc] def getColumnValue[T](index: Int, defaultVal: T)(getter: Int => T): T = { | ||
| checkOpen() | ||
| // the passed index value is 1-indexed, but the underlying array is 0-indexed | ||
| val columnIndex = index - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you swap the name between columnIndex and index? then, in the whole source file, columnIndex is 1-based consistently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
@cty123 BTW, you can apply JIRA account at https://selfserve.apache.org/jira-account.html |
| } | ||
| } | ||
|
|
||
| private[jdbc] def getColumnValue[T](columnIndex: Int, defaultVal: T)(getter: Int => T): T = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need to be visible in jdbc module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I put jdbc here because checkOpen is private[jdbc]. This function should really be used inside SparkConnectResultSet.scala file. I can remove this.
...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala
Outdated
Show resolved
Hide resolved
…nnect/client/jdbc/SparkConnectResultSet.scala Co-authored-by: YangJie <yangjie01@baidu.com>
27466c2 to
eecc9df
Compare
… checkOpen and check index boundary ### What changes were proposed in this pull request? This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR #52947, >1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver. > >2. Right now when index goes out of bound, it throws `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException` > >``` > * throws SQLException if the columnIndex is not valid; >``` This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check. ### Why are the changes needed? Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition. ### Does this PR introduce _any_ user-facing change? This PR is a small fix related to a new feature introduced recently. ### How was this patch tested? I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52988 from cty123/cty123/address-spark-connect-getters. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit c21d5a4) Signed-off-by: yangjie01 <yangjie01@baidu.com>
… checkOpen and check index boundary ### What changes were proposed in this pull request? This PR aims to do a minor correction on the current get* functions from `SparkConnectResultSet` class. As previously discussed in the PR apache#52947, >1. For every getter function, if the statement is closed, the ResultSet should be unusable. I have verified this with MySQL driver and Postgresql driver. > >2. Right now when index goes out of bound, it throws `java.lang.ArrayIndexOutOfBoundsException`, but based on the specification on `java.sql.ResultSet` which is implemented by `SparkConnectResultSet` class, it should throw `java.sql.SQLException` > >``` > * throws SQLException if the columnIndex is not valid; >``` This PR proposes a unified wrapper function called `getColumnValue(columnIndex: Int)` that aims to wrap the `checkOpen` as well as the index boundary check. ### Why are the changes needed? Currently the get* functions don't follow the expected behaviors of `java.sql.ResultSet`. It's technically not a big problem, but since the `SparkConnectResultSet` aims to implement the `java.sql.ResultSet` class, it should strictly follow the specification documented on the interface definition. ### Does this PR introduce _any_ user-facing change? This PR is a small fix related to a new feature introduced recently. ### How was this patch tested? I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside `SparkConnectResultSet` class to make sure the correct exception(java.sql.SQLException) is thrown. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52988 from cty123/cty123/address-spark-connect-getters. Lead-authored-by: cty123 <ctychen2216@gmail.com> Co-authored-by: cty <ctychen2216@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
What changes were proposed in this pull request?
This PR aims to do a minor correction on the current get* functions from
SparkConnectResultSetclass. As previously discussed in the PR #52947,This PR proposes a unified wrapper function called
getColumnValue(columnIndex: Int)that aims to wrap thecheckOpenas well as the index boundary check.Why are the changes needed?
Currently the get* functions don't follow the expected behaviors of
java.sql.ResultSet. It's technically not a big problem, but since theSparkConnectResultSetaims to implement thejava.sql.ResultSetclass, it should strictly follow the specification documented on the interface definition.Does this PR introduce any user-facing change?
This PR is a small fix related to a new feature introduced recently.
How was this patch tested?
I added 2 tests each covering a bullet point I named above. These 2 functions calls all the get* functions inside
SparkConnectResultSetclass to make sure the correct exception(java.sql.SQLException) is thrown.Was this patch authored or co-authored using generative AI tooling?
No