[KYUUBI #4316] Fix returned Timestamp values may lose precision #4318

pan3793 · 2023-02-13T13:46:47Z

Why are the changes needed?

This PR proposes to use org.apache.spark.sql.execution#toHiveString to replace org.apache.kyuubi.engine.spark.schema#toHiveString to get consistent result w/ spark-sql and STS.

Because of SPARK-32006, it only works w/ Spark 3.1 and above.

The patch takes effects on both thrift and arrow result format.

How was this patch tested?

Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate

➜  ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default'
Connecting to jdbc:hive2://0.0.0.0:10009/default
Connected to: Spark SQL (version 3.3.1)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.9 by Apache Hive
0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789');
+----------------------------------------------+
| to_timestamp(2023-02-08 22:17:33.123456789)  |
+----------------------------------------------+
| 2023-02-08 22:17:33.123456                   |
+----------------------------------------------+
1 row selected (0.415 seconds)

Run test locally before make a pull request

This reverts commit 4efd4d0.

cfmcgrady · 2023-02-13T14:35:04Z

...ls/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala

-            }
-          values.add(value)
+          values.add(
+            HiveResult.toHiveString((row.get(ordinal), typ), false, getTimeFormatters(timeZone)))


nested = true

Why? It's a top-level value.

refer to org.apache.spark.sql.hive.thriftserver.RowSetUtils#toTColumn()

As offline discussed, nested should be false here, but it does not affect the output.

cfmcgrady · 2023-02-13T14:51:18Z

...ls/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala

 import org.apache.kyuubi.util.RowSetUtils._

 object RowSet {

+  def getTimeFormatters(timeZone: ZoneId): TimeFormatters = {


why not call HiveResult.getTimeFormatters directly?

Because SQLConf#get does not work properly in this case.

Wrapped SQLConf.withExistingConf to see if it works

ulysses-you · 2023-02-14T03:26:44Z

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala

@@ -21,8 +21,9 @@ import java.time.ZoneId

 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.execution.HiveResult


is there any compatibility issue for different Spark version ?

The signature is stable since 3.1, and the current implementation is compatible w/ Spark 3.1 and above.

codecov-commenter · 2023-02-14T04:10:52Z

Codecov Report

Merging #4318 (ba9016f) into master (10adbdc) will decrease coverage by 0.05%.
The diff coverage is 54.54%.

@@             Coverage Diff              @@
##             master    #4318      +/-   ##
============================================
- Coverage     53.59%   53.55%   -0.05%     
  Complexity       13       13              
============================================
  Files           562      562              
  Lines         30731    30689      -42     
  Branches       4159     4142      -17     
============================================
- Hits          16470    16434      -36     
- Misses        12701    12709       +8     
+ Partials       1560     1546      -14

Impacted Files	Coverage Δ
...ain/scala/org/apache/kyuubi/util/RowSetUtils.scala	`26.08% <0.00%> (-57.25%)`	⬇️
.../kyuubi/jdbc/hive/arrow/ArrowColumnarBatchRow.java	`0.00% <0.00%> (ø)`
...kyuubi/engine/spark/operation/SparkOperation.scala	`76.31% <69.56%> (-0.40%)`	⬇️
.../kyuubi/engine/spark/operation/ExecutePython.scala	`81.46% <80.00%> (ø)`
...org/apache/kyuubi/engine/spark/schema/RowSet.scala	`98.52% <100.00%> (+7.35%)`	⬆️
...g/apache/spark/sql/kyuubi/SparkDatasetHelper.scala	`80.00% <100.00%> (ø)`
...uubi/engine/spark/events/SparkOperationEvent.scala	`88.88% <0.00%> (-5.56%)`	⬇️
...ache/kyuubi/sql/plan/trino/TrinoFeOperations.scala	`39.13% <0.00%> (-3.73%)`	⬇️
...ver/trino/api/KyuubiTrinoOperationTranslator.scala	`11.76% <0.00%> (-1.57%)`	⬇️
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

### _Why are the changes needed?_ This PR proposes to use `org.apache.spark.sql.execution#toHiveString` to replace `org.apache.kyuubi.engine.spark.schema#toHiveString` to get consistent result w/ `spark-sql` and `STS`. Because of [SPARK-32006](https://issues.apache.org/jira/browse/SPARK-32006), it only works w/ Spark 3.1 and above. The patch takes effects on both thrift and arrow result format. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate ``` ➜ ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default' Connecting to jdbc:hive2://0.0.0.0:10009/default Connected to: Spark SQL (version 3.3.1) Driver: Hive JDBC (version 2.3.9) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789'); +----------------------------------------------+ | to_timestamp(2023-02-08 22:17:33.123456789) | +----------------------------------------------+ | 2023-02-08 22:17:33.123456 | +----------------------------------------------+ 1 row selected (0.415 seconds) ``` - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4318 from pan3793/hive-string. Closes #4316 ba9016f [Cheng Pan] nit 8be774b [Cheng Pan] nit bd696fe [Cheng Pan] nit b5cf051 [Cheng Pan] fix dd6b702 [Cheng Pan] test 63edd34 [Cheng Pan] nit 37cc70a [Cheng Pan] Fix python ut c66ad22 [Cheng Pan] [KYUUBI #4316] Fix returned Timestamp values may lose precision 41d9444 [Cheng Pan] Revert "[KYUUBI #3958] Fix Spark session timezone format" Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 8fe7947) Signed-off-by: Cheng Pan <chengpan@apache.org>

pan3793 · 2023-02-14T11:00:14Z

Thanks, merged to master/1.7

pan3793 added 2 commits February 13, 2023 20:27

Revert "[KYUUBI apache#3958] Fix Spark session timezone format"

41d9444

This reverts commit 4efd4d0.

[KYUUBI apache#4316] Fix returned Timestamp values may lose precision

c66ad22

github-actions bot added module:common module:ctl module:hive module:server module:spark labels Feb 13, 2023

pan3793 self-assigned this Feb 13, 2023

pan3793 requested review from cfmcgrady and cxzl25 and removed request for cfmcgrady February 13, 2023 13:49

cfmcgrady reviewed Feb 13, 2023

View reviewed changes

pan3793 added 2 commits February 14, 2023 02:14

Fix python ut

37cc70a

nit

63edd34

pan3793 added this to the v1.7.0 milestone Feb 13, 2023

ulysses-you reviewed Feb 14, 2023

View reviewed changes

test

dd6b702

pan3793 added 3 commits February 14, 2023 12:17

fix

b5cf051

nit

bd696fe

nit

8be774b

pan3793 requested a review from cfmcgrady February 14, 2023 07:13

nit

ba9016f

cfmcgrady approved these changes Feb 14, 2023

View reviewed changes

pan3793 closed this in 8fe7947 Feb 14, 2023

pan3793 deleted the hive-string branch February 15, 2023 12:46

cfmcgrady mentioned this pull request Feb 21, 2023

[ARROW] Assign a new execution id for arrow-based result #4392

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KYUUBI #4316] Fix returned Timestamp values may lose precision #4318

[KYUUBI #4316] Fix returned Timestamp values may lose precision #4318

pan3793 commented Feb 13, 2023 •

edited

Loading

cfmcgrady Feb 13, 2023

pan3793 Feb 13, 2023

cfmcgrady Feb 14, 2023

pan3793 Feb 14, 2023

cfmcgrady Feb 13, 2023

pan3793 Feb 13, 2023

pan3793 Feb 14, 2023

pan3793 Feb 14, 2023

ulysses-you Feb 14, 2023

pan3793 Feb 14, 2023

codecov-commenter commented Feb 14, 2023 •

edited

Loading

pan3793 commented Feb 14, 2023

[KYUUBI #4316] Fix returned Timestamp values may lose precision #4318

[KYUUBI #4316] Fix returned Timestamp values may lose precision #4318

Conversation

pan3793 commented Feb 13, 2023 • edited Loading

Why are the changes needed?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 14, 2023 • edited Loading

Codecov Report

pan3793 commented Feb 14, 2023

pan3793 commented Feb 13, 2023 •

edited

Loading

codecov-commenter commented Feb 14, 2023 •

edited

Loading