Skip to content

Conversation

@zhaomin1423
Copy link
Member

@zhaomin1423 zhaomin1423 commented Sep 20, 2023

Why are the changes needed?

close #5317 (comment)

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

Was this patch authored or co-authored using generative AI tooling?

No

…tion on reading Hive Avro partitioned table
@zhaomin1423 zhaomin1423 changed the title [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException o… [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table Sep 20, 2023
@zhaomin1423 zhaomin1423 requested review from pan3793 and yikf September 20, 2023 16:54
…tion on reading Hive Avro partitioned table
@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2023

Codecov Report

Merging #5319 (7944734) into master (cd325b4) will not change coverage.
Report is 2 commits behind head on master.
The diff coverage is 0.00%.

❗ Current head 7944734 differs from pull request most recent head d38832f. Consider uploading reports for the commit d38832f to get more accurate results

@@          Coverage Diff           @@
##           master   #5319   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         590     590           
  Lines       33429   33432    +3     
  Branches     4423    4422    -1     
======================================
- Misses      33429   33432    +3     
Files Changed Coverage Δ
...uubi/spark/connector/hive/read/HiveFileIndex.scala 0.00% <0.00%> (ø)
...nnector/hive/read/HivePartitionReaderFactory.scala 0.00% <0.00%> (ø)
...he/kyuubi/spark/connector/hive/read/HiveScan.scala 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

| CREATE TABLE IF NOT EXISTS
| $table (id String, year String, month string)
| USING PARQUET
| USING $format
Copy link
Member

@pan3793 pan3793 Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how USING would be handled in this connector, I verified with STORED AS instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both work and also add unit tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as i known, This point depend on the implementation of the catalog. v2HiveCatalog creates tables and queries them, in fact, STORED AS & USING are the same thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small difference, the format of text does not work when STORED AS is used, only textfile, but USING work well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is caused by the parsing of spark, In spark, text file format is text, hive stored as is textfile

}
}

test("read partitioned avro table") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add test for un-partitioned serde table

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and add other formats.

@pan3793 pan3793 requested a review from ulysses-you September 21, 2023 02:11
@yikf
Copy link
Contributor

yikf commented Sep 21, 2023

@zhaomin1423 Thanks, LGTM

@ulysses-you
Copy link
Contributor

lgtm

Comment on lines 50 to 54
val clause = if (hiveTable) {
"STORED AS"
} else {
"USING"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's inline it

Comment on lines 69 to 73
val hivePartition = if (bindHivePart.isDefined) {
HiveClientImpl.toHivePartition(bindHivePart.get, hiveTable)
} else {
null
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val hivePartition = if (bindHivePart.isDefined) {
HiveClientImpl.toHivePartition(bindHivePart.get, hiveTable)
} else {
null
}
bindHivePart.map(HiveClientImpl.toHivePartition(_, hiveTable)).orNull

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except to small nits

@zhaomin1423
Copy link
Member Author

LGTM, except to small nits

Update, thanks.

broadcastHiveConf,
nonPartitionReadDataKeys,
bindHivePart,
hivePartition,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm... seems we can refactor it to a Option[HivePartition] to avoid null propagation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zhaomin1423 zhaomin1423 self-assigned this Sep 21, 2023
bindPartition: HivePartition): Iterator[Writable] = {
bindPartition: Option[HivePartition]): Iterator[Writable] = {
// Obtain binding HivePartition from input partitioned file
val partDesc =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Option too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pan3793 pan3793 added this to the v1.8.0 milestone Sep 21, 2023
@pan3793
Copy link
Member

pan3793 commented Sep 21, 2023

The last two commits only change the code style, merging to master/1.8

@pan3793 pan3793 closed this in 167e6c1 Sep 21, 2023
pan3793 added a commit that referenced this pull request Sep 21, 2023
…n reading Hive Avro partitioned table

### _Why are the changes needed?_

close #5317 (comment)

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes #5319 from zhaomin1423/fixhive-connector.

Closes #5317

02e5321 [Cheng Pan] nit
cadabf4 [Cheng Pan] nit
d38832f [zhaomin] improve
ee5b62d [zhaomin] improve
7944734 [zhaomin] improve
e3eca91 [zhaomin] add tests
d9302e2 [zhaomin] [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table
0bc8ec1 [zhaomin] [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table

Lead-authored-by: zhaomin <zhaomin1423@163.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 167e6c1)
Signed-off-by: Cheng Pan <chengpan@apache.org>
cxzl25 pushed a commit to cxzl25/kyuubi that referenced this pull request Nov 21, 2023
…tion on reading Hive Avro partitioned table

### _Why are the changes needed?_

close apache#5317 (comment)

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes apache#5319 from zhaomin1423/fixhive-connector.

Closes apache#5317

02e5321 [Cheng Pan] nit
cadabf4 [Cheng Pan] nit
d38832f [zhaomin] improve
ee5b62d [zhaomin] improve
7944734 [zhaomin] improve
e3eca91 [zhaomin] add tests
d9302e2 [zhaomin] [KYUUBI apache#5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table
0bc8ec1 [zhaomin] [KYUUBI apache#5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table

Lead-authored-by: zhaomin <zhaomin1423@163.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 167e6c1)
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table

6 participants