Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #4316] Fix returned Timestamp values may lose precision #4318

Closed
wants to merge 9 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Feb 13, 2023

Why are the changes needed?

This PR proposes to use org.apache.spark.sql.execution#toHiveString to replace org.apache.kyuubi.engine.spark.schema#toHiveString to get consistent result w/ spark-sql and STS.

Because of SPARK-32006, it only works w/ Spark 3.1 and above.

The patch takes effects on both thrift and arrow result format.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

➜  ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default'
Connecting to jdbc:hive2://0.0.0.0:10009/default
Connected to: Spark SQL (version 3.3.1)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.9 by Apache Hive
0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789');
+----------------------------------------------+
| to_timestamp(2023-02-08 22:17:33.123456789)  |
+----------------------------------------------+
| 2023-02-08 22:17:33.123456                   |
+----------------------------------------------+
1 row selected (0.415 seconds)
  • Run test locally before make a pull request

}
values.add(value)
values.add(
HiveResult.toHiveString((row.get(ordinal), typ), false, getTimeFormatters(timeZone)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nested = true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's a top-level value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to org.apache.spark.sql.hive.thriftserver.RowSetUtils#toTColumn()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As offline discussed, nested should be false here, but it does not affect the output.

import org.apache.kyuubi.util.RowSetUtils._

object RowSet {

def getTimeFormatters(timeZone: ZoneId): TimeFormatters = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call HiveResult.getTimeFormatters directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SQLConf#get does not work properly in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped SQLConf.withExistingConf to see if it works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@pan3793 pan3793 added this to the v1.7.0 milestone Feb 13, 2023
@@ -21,8 +21,9 @@ import java.time.ZoneId

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Dataset, Row}
import org.apache.spark.sql.execution.HiveResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any compatibility issue for different Spark version ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature is stable since 3.1, and the current implementation is compatible w/ Spark 3.1 and above.

@codecov-commenter
Copy link

codecov-commenter commented Feb 14, 2023

Codecov Report

Merging #4318 (ba9016f) into master (10adbdc) will decrease coverage by 0.05%.
The diff coverage is 54.54%.

@@             Coverage Diff              @@
##             master    #4318      +/-   ##
============================================
- Coverage     53.59%   53.55%   -0.05%     
  Complexity       13       13              
============================================
  Files           562      562              
  Lines         30731    30689      -42     
  Branches       4159     4142      -17     
============================================
- Hits          16470    16434      -36     
- Misses        12701    12709       +8     
+ Partials       1560     1546      -14     
Impacted Files Coverage Δ
...ain/scala/org/apache/kyuubi/util/RowSetUtils.scala 26.08% <0.00%> (-57.25%) ⬇️
.../kyuubi/jdbc/hive/arrow/ArrowColumnarBatchRow.java 0.00% <0.00%> (ø)
...kyuubi/engine/spark/operation/SparkOperation.scala 76.31% <69.56%> (-0.40%) ⬇️
.../kyuubi/engine/spark/operation/ExecutePython.scala 81.46% <80.00%> (ø)
...org/apache/kyuubi/engine/spark/schema/RowSet.scala 98.52% <100.00%> (+7.35%) ⬆️
...g/apache/spark/sql/kyuubi/SparkDatasetHelper.scala 80.00% <100.00%> (ø)
...uubi/engine/spark/events/SparkOperationEvent.scala 88.88% <0.00%> (-5.56%) ⬇️
...ache/kyuubi/sql/plan/trino/TrinoFeOperations.scala 39.13% <0.00%> (-3.73%) ⬇️
...ver/trino/api/KyuubiTrinoOperationTranslator.scala 11.76% <0.00%> (-1.57%) ⬇️
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793 pan3793 closed this in 8fe7947 Feb 14, 2023
pan3793 added a commit that referenced this pull request Feb 14, 2023
### _Why are the changes needed?_

This PR proposes to use `org.apache.spark.sql.execution#toHiveString` to replace `org.apache.kyuubi.engine.spark.schema#toHiveString` to get consistent result w/ `spark-sql` and `STS`.

Because of [SPARK-32006](https://issues.apache.org/jira/browse/SPARK-32006), it only works w/ Spark 3.1 and above.

The patch takes effects on both thrift and arrow result format.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate
```
➜  ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default'
Connecting to jdbc:hive2://0.0.0.0:10009/default
Connected to: Spark SQL (version 3.3.1)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.9 by Apache Hive
0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789');
+----------------------------------------------+
| to_timestamp(2023-02-08 22:17:33.123456789)  |
+----------------------------------------------+
| 2023-02-08 22:17:33.123456                   |
+----------------------------------------------+
1 row selected (0.415 seconds)
```

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4318 from pan3793/hive-string.

Closes #4316

ba9016f [Cheng Pan] nit
8be774b [Cheng Pan] nit
bd696fe [Cheng Pan] nit
b5cf051 [Cheng Pan] fix
dd6b702 [Cheng Pan] test
63edd34 [Cheng Pan] nit
37cc70a [Cheng Pan] Fix python ut
c66ad22 [Cheng Pan] [KYUUBI #4316] Fix returned Timestamp values may lose precision
41d9444 [Cheng Pan] Revert "[KYUUBI #3958] Fix Spark session timezone format"

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 8fe7947)
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793
Copy link
Member Author

pan3793 commented Feb 14, 2023

Thanks, merged to master/1.7

@pan3793 pan3793 deleted the hive-string branch February 15, 2023 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants