-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39041][SQL] Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly #36373
Conversation
…ableSchema directly
Map( | ||
TCLIServiceConstants.PRECISION -> TTypeQualifierValue.i32Value(d.precision), | ||
TCLIServiceConstants.SCALE -> TTypeQualifierValue.i32Value(d.scale)).asJava | ||
case _ => Collections.emptyMap[String, TTypeQualifierValue]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if char/varchar can be seen via resultset metadata, we shall also provide the char length here for client to take advantage of it
case StringType => TTypeId.STRING_TYPE | ||
case _: DecimalType => TTypeId.DECIMAL_TYPE | ||
case DateType => TTypeId.DATE_TYPE | ||
// TODO: Shall use TIMESTAMPLOCALTZ_TYPE, keep AS-IS now for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimestampType keeps being converted to TTypeId.TIMESTAMP_TYPE in this PR
@@ -35,7 +36,7 @@ public SessionHandle openSession(String username, String password) | |||
} | |||
|
|||
@Override | |||
public RowSet fetchResults(OperationHandle opHandle) throws HiveSQLException { | |||
public TRowSet fetchResults(OperationHandle opHandle) throws HiveSQLException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLIServiceClient and its child is used in our test only, changes here will not have impacts on real-world clients
…ableSchema directly
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
Show resolved
Hide resolved
|
||
private def toTColumnValue( | ||
ordinal: Int, | ||
row: Row, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but it seems more efficient if we can convert InternalRow
directly to TRowSet
RowSet fetchResults(OperationHandle opHandle, FetchOrientation orientation, | ||
long maxRows, FetchType fetchType) throws HiveSQLException; | ||
TRowSet fetchResults(OperationHandle opHandle, FetchOrientation orientation, | ||
long maxRows, FetchType fetchType) throws HiveSQLException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous indentation was correct.
// TODO: Shall use TIMESTAMPLOCALTZ_TYPE, keep AS-IS now for | ||
// unnecessary behavior change | ||
case TimestampType => TTypeId.TIMESTAMP_TYPE | ||
case TimestampNTZType => TTypeId.TIMESTAMP_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing some context here. Looking at RowSetUtils
, datetime types are using String as result. Where do we use TTypeId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for rowset schema. jdbc client side can call getString to get the raw string or getObject to get an java object where it is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one question
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSession.java
Outdated
Show resolved
Hide resolved
…i/session/HiveSession.java
thanks, merged to master |
What changes were proposed in this pull request?
The PR is mainly refactoring, aiming to support TimestampNTZ/LTZ at the same time in the future for the thrift server.
As we all know, in spark, we have hive dependencies, which can be classified into two types:
The problem here is that we now convert spark results to
org.apache.hadoop.hive.serde2.thrift.Type
first and then toorg.apache.hive.service.rpc.thrift.TTypeId
. the former does not have 2 timestamp types, namely, doesn't haveTIMESTAMPLOCALTZ_TYPE
.To avoid this, we take a shortcut to map spark results to thrift schema and rowset directly.
Besides, it also can avoid some unnecessary memory copies from type to type.
Most functionalities have been verified in apache/kyuubi for several years.
Why are the changes needed?
for supporting more spark datatypes through hive jdbc
Does this PR introduce any user-facing change?
How was this patch tested?
existing ut shall be enough