-
Notifications
You must be signed in to change notification settings - Fork 982
[KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table #5319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tion on reading Hive Avro partitioned table
…tion on reading Hive Avro partitioned table
Codecov Report
@@ Coverage Diff @@
## master #5319 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 590 590
Lines 33429 33432 +3
Branches 4423 4422 -1
======================================
- Misses 33429 33432 +3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| | CREATE TABLE IF NOT EXISTS | ||
| | $table (id String, year String, month string) | ||
| | USING PARQUET | ||
| | USING $format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how USING would be handled in this connector, I verified with STORED AS instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both work and also add unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as i known, This point depend on the implementation of the catalog. v2HiveCatalog creates tables and queries them, in fact, STORED AS & USING are the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small difference, the format of text does not work when STORED AS is used, only textfile, but USING work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is caused by the parsing of spark, In spark, text file format is text, hive stored as is textfile
| } | ||
| } | ||
|
|
||
| test("read partitioned avro table") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add test for un-partitioned serde table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and add other formats.
|
@zhaomin1423 Thanks, LGTM |
|
lgtm |
| val clause = if (hiveTable) { | ||
| "STORED AS" | ||
| } else { | ||
| "USING" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's inline it
| val hivePartition = if (bindHivePart.isDefined) { | ||
| HiveClientImpl.toHivePartition(bindHivePart.get, hiveTable) | ||
| } else { | ||
| null | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| val hivePartition = if (bindHivePart.isDefined) { | |
| HiveClientImpl.toHivePartition(bindHivePart.get, hiveTable) | |
| } else { | |
| null | |
| } | |
| bindHivePart.map(HiveClientImpl.toHivePartition(_, hiveTable)).orNull |
pan3793
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except to small nits
21a857d to
7944734
Compare
Update, thanks. |
| broadcastHiveConf, | ||
| nonPartitionReadDataKeys, | ||
| bindHivePart, | ||
| hivePartition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm... seems we can refactor it to a Option[HivePartition] to avoid null propagation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| bindPartition: HivePartition): Iterator[Writable] = { | ||
| bindPartition: Option[HivePartition]): Iterator[Writable] = { | ||
| // Obtain binding HivePartition from input partitioned file | ||
| val partDesc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Option too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
The last two commits only change the code style, merging to master/1.8 |
…n reading Hive Avro partitioned table ### _Why are the changes needed?_ close #5317 (comment) ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes #5319 from zhaomin1423/fixhive-connector. Closes #5317 02e5321 [Cheng Pan] nit cadabf4 [Cheng Pan] nit d38832f [zhaomin] improve ee5b62d [zhaomin] improve 7944734 [zhaomin] improve e3eca91 [zhaomin] add tests d9302e2 [zhaomin] [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table 0bc8ec1 [zhaomin] [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table Lead-authored-by: zhaomin <zhaomin1423@163.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 167e6c1) Signed-off-by: Cheng Pan <chengpan@apache.org>
…tion on reading Hive Avro partitioned table ### _Why are the changes needed?_ close apache#5317 (comment) ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes apache#5319 from zhaomin1423/fixhive-connector. Closes apache#5317 02e5321 [Cheng Pan] nit cadabf4 [Cheng Pan] nit d38832f [zhaomin] improve ee5b62d [zhaomin] improve 7944734 [zhaomin] improve e3eca91 [zhaomin] add tests d9302e2 [zhaomin] [KYUUBI apache#5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table 0bc8ec1 [zhaomin] [KYUUBI apache#5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table Lead-authored-by: zhaomin <zhaomin1423@163.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 167e6c1) Signed-off-by: Cheng Pan <chengpan@apache.org>
Why are the changes needed?
close #5317 (comment)
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request
Was this patch authored or co-authored using generative AI tooling?
No