-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54350][SQL][STS] SparkGetColumnsOperation ORDINAL_POSITION should be 1-based #53062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @yaooqinn , @LuciferYang
| null, // SQL_DATETIME_SUB | ||
| null, // CHAR_OCTET_LENGTH | ||
| pos.asInstanceOf[AnyRef], // ORDINAL_POSITION | ||
| (pos + 1).asInstanceOf[AnyRef], // ORDINAL_POSITION, 1-based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this looks like a legit fix to follow the standard, do you think we need to document this as a breaking change or provide a legacy conf, @pan3793 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @dongjoon-hyun 's suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late reply. I think it should be low risk, AFAIK, this is mostly used by tools (RDBMS management GUI tools or BI tools) to sort the column, and the implementation is relatively lenient, but for safety, I can add a legacy config and fix this only for 4.1.
| null, // SQL_DATETIME_SUB | ||
| null, // CHAR_OCTET_LENGTH | ||
| pos.asInstanceOf[AnyRef], // ORDINAL_POSITION | ||
| (pos + 1).asInstanceOf[AnyRef], // ORDINAL_POSITION, 1-based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this PR is breaking the existing BI tool integration.
|
@dongjoon-hyun @LuciferYang I added a config |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @pan3793 and @LuciferYang .
Merged to master/4.1 for Apache Spark 4.1.0.
…uld be 1-based ### What changes were proposed in this pull request? The SparkGetColumnsOperation is mainly used for the JDBC driver, while JDBC uses 1-based ordinal/column-index instead of 0-based. This is also documented in Hive API. https://github.com/apache/spark/blob/551b922a53acfdfeb2c065d5dedf35cb8cd30e1d/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java#L94-L95 Note, the GetColumnsOperation, which is originally copied from the Hive has a correct implementation, the issue only exists in SparkGetColumnsOperation. For safety, a config `spark.sql.legacy.hive.thriftServer.useZeroBasedColumnOrdinalPosition` is added to allow the user to switch back to the previous behavior. ### Why are the changes needed? The SparkGetColumnsOperation is mainly used by JDBC [java.sql.DatabaseMetaData#getColumns](https://docs.oracle.com/en/java/javase/17/docs/api/java.sql/java/sql/DatabaseMetaData.html#getColumns(java.lang.String,java.lang.String,java.lang.String,java.lang.String)), this change makes it satisfy the JDBC API specification. ### Does this PR introduce _any_ user-facing change? Yes, see the above section. ### How was this patch tested? UTs are modified. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53062 from pan3793/SPARK-54350. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 05bc5d4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
The SparkGetColumnsOperation is mainly used for the JDBC driver, while JDBC uses 1-based ordinal/column-index instead of 0-based.
This is also documented in Hive API.
spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
Lines 94 to 95 in 551b922
Note, the GetColumnsOperation, which is originally copied from the Hive has a correct implementation, the issue only exists in SparkGetColumnsOperation.
For safety, a config
spark.sql.legacy.hive.thriftServer.useZeroBasedColumnOrdinalPositionis added to allow the user to switch back to the previous behavior.Why are the changes needed?
The SparkGetColumnsOperation is mainly used by JDBC java.sql.DatabaseMetaData#getColumns, this change makes it satisfy the JDBC API specification.
Does this PR introduce any user-facing change?
Yes, see the above section.
How was this patch tested?
UTs are modified.
Was this patch authored or co-authored using generative AI tooling?
No.