Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARROW] Fix Spark session timezone format in arrow-based result format #4326

Closed
wants to merge 10 commits into from

Conversation

cfmcgrady
Copy link
Contributor

@cfmcgrady cfmcgrady commented Feb 14, 2023

Why are the changes needed?

  1. this PR introduces a new configuration called kyuubi.operation.result.arrow.timestampAsString, when true, arrow-based rowsets will convert timestamp-type columns to strings for transmission.

  2. kyuubi.operation.result.arrow.timestampAsString default setting to false for better transmission performance

  3. the PR fixes timezone issue in arrow based result format described in Fix Spark session timezone format #3958

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

@cfmcgrady cfmcgrady changed the title [ARROW] Fix Spark session timezone format in arrow-based result format. [ARROW] Fix Spark session timezone format in arrow-based result format Feb 14, 2023
return colTypes;
}
/**
* 1. the complex types (map/array/struct) are always converted to string type to transport 2. if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move 2. *** to new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spotless plugin makes this change, otherwise, we wouldn't pass the style check :(

Copy link
Member

@pan3793 pan3793 Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, leave it then

@codecov-commenter
Copy link

Codecov Report

Merging #4326 (38c7fc9) into master (8fe7947) will increase coverage by 0.07%.
The diff coverage is 32.00%.

@@             Coverage Diff              @@
##             master    #4326      +/-   ##
============================================
+ Coverage     53.53%   53.61%   +0.07%     
  Complexity       13       13              
============================================
  Files           562      562              
  Lines         30704    30804     +100     
  Branches       4142     4153      +11     
============================================
+ Hits          16438    16515      +77     
- Misses        12721    12732      +11     
- Partials       1545     1557      +12     
Impacted Files Coverage Δ
...he/kyuubi/jdbc/hive/KyuubiArrowBasedResultSet.java 0.00% <0.00%> (ø)
...he/kyuubi/jdbc/hive/KyuubiArrowQueryResultSet.java 0.00% <0.00%> (ø)
.../kyuubi/jdbc/hive/arrow/ArrowColumnarBatchRow.java 0.00% <0.00%> (ø)
...a/org/apache/kyuubi/jdbc/hive/KyuubiStatement.java 22.28% <11.11%> (-0.49%) ⬇️
...g/apache/spark/sql/kyuubi/SparkDatasetHelper.scala 78.57% <75.00%> (-1.43%) ⬇️
...uubi/engine/spark/operation/ExecuteStatement.scala 76.92% <100.00%> (+2.23%) ⬆️
...kyuubi/engine/spark/operation/SparkOperation.scala 76.47% <100.00%> (+0.15%) ⬆️
...in/scala/org/apache/kyuubi/config/KyuubiConf.scala 97.45% <100.00%> (-0.06%) ⬇️
.../apache/kyuubi/jdbc/hive/JdbcColumnAttributes.java 83.33% <100.00%> (ø)
.../org/apache/kyuubi/server/trino/api/v1/dto/Ok.java 46.15% <0.00%> (-23.08%) ⬇️
... and 25 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793 pan3793 added this to the v1.7.0 milestone Feb 18, 2023
@pan3793 pan3793 closed this in 6bd0016 Feb 18, 2023
pan3793 pushed a commit that referenced this pull request Feb 18, 2023
…ed result format

### _Why are the changes needed?_

1. this PR introduces a new configuration called `kyuubi.operation.result.arrow.timestampAsString`, when true, arrow-based rowsets will convert timestamp-type columns to strings for transmission.

2. `kyuubi.operation.result.arrow.timestampAsString` default setting to false for better transmission performance

3. the PR fixes timezone issue in arrow based result format described in #3958

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4326 from cfmcgrady/arrow-string-ts.

Closes #4326

38c7fc9 [Fu Chen] fix style
d864db0 [Fu Chen] address comment
b714b3e [Fu Chen] revert externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala
6c4eb50 [Fu Chen] minor
289b600 [Fu Chen] timstampAsString = false by default
78b7cab [Fu Chen] fix
f560135 [Fu Chen] debug info
b8e4b28 [Fu Chen] fix ut
87c6f9e [Fu Chen] update docs
86f6cb7 [Fu Chen] arrow based rowset timestamp as string

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 6bd0016)
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793
Copy link
Member

pan3793 commented Feb 18, 2023

Thanks, merged to master/1.7

SusurHe pushed a commit to SusurHe/incubator-kyuubi that referenced this pull request Feb 21, 2023
* 'master' of github.com:apache/kyuubi: (1557 commits)
  [KYUUBI apache#3951][FOLLOWUP] Audit the rest request params
  [KYUUBI apache#4377] Grant execute permission to release scripts
  [KYUUBI apache#4374] Release uploading should include kyuubi-spark-connector-hive
  [KYUUBI apache#4267] Show warning if SessionHandle is invalid
  [KYUUBI apache#4385] [DOCS] Refine release process
  [KYUUBI apache#4352] Support System.gc() with periodic GC interval
  [KYUUBI apache#4152][FOLLOWUP] LDAP configurations should be server-only
  [KYUUBI apache#4373] Using SVN_STAGING_REPO instead of SVN_STAGING_REPO in the release script to fix echo message
  [KYUUBI apache#4372] Support to return null value for OperationsResource rowset
  [KYUUBI apache#4371] Fix typo in `kyuubi_ecosystem.drawio`
  [KYUUBI apache#4216] Support to transfer client version for kyuubi hive jdbc and rest client sdk
  [KYUUBI apache#4345] Add the doc of kyuubi trino server
  [KYUUBI apache#3081][DOCS] Add Hudi connector doc in Trino
  [KYUUBI apache#4357] Bump Jersey from 2.38 to 2.39
  [KYUUBI apache#4338][FOLLOWUP] Fix K8s integration tests
  [KYUUBI apache#4326] [ARROW] Fix Spark session timezone format in arrow-based result format
  [KYUUBI apache#4360][FOLLOWUP] Get valid unlimited users from existing limiters instead of conf
  [KYUUBI apache#4362] Add `_configurations` in kerberos.rst
  [KYUUBI apache#4338] Bump Spark from 3.3.1 to 3.3.2
  [KYUUBI apache#4119][FOLLOWUP] Add app start time for batch api docs
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants