[HUDI-57] Added Orc Writer to Support Orc in Hudi#2320
[HUDI-57] Added Orc Writer to Support Orc in Hudi#2320manijndl7 wants to merge 2 commits intoapache:masterfrom
Conversation
manijndl7
commented
Dec 10, 2020
- Added Orc Writer Integrated it with Factory with all Major supported Data Types.
- Tested On docker env with hive-exec-2.3.1 and orc-core-1.3.3
- Added Orc Utils to Support Major Operation with Orc files.
|
@vinothchandar @n3nash can you please have a look |
Codecov Report
@@ Coverage Diff @@
## master #2320 +/- ##
============================================
- Coverage 52.29% 52.18% -0.12%
Complexity 2630 2630
============================================
Files 329 331 +2
Lines 14743 14773 +30
Branches 1484 1488 +4
============================================
- Hits 7710 7709 -1
- Misses 6419 6449 +30
- Partials 614 615 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
Hi @manijndl7 , thanks for your contribution. Excited to see this feature coming! |
|
Thanks @garyli1019 for checking this !! yea its in my pipeline , i am currently focusing on writing ORC Reader then i will commit the test cases. |
vinothchandar
left a comment
There was a problem hiding this comment.
Looks like a good start! I will mark this PR WIP for now.
Is there a good first milestone we are targeting here? for e.g Write data using Spark Datasource and read via Hive, for COW tables.
| <dependency> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-exec</artifactId> | ||
| <version>2.3.1</version> |
There was a problem hiding this comment.
could we use the hive.version property that we already have?
| <dependency> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-core</artifactId> | ||
| <version>1.3.3</version> |
There was a problem hiding this comment.
introduce a property for orc version just like we ahve for parquet?
| private static final String DEFAULT_MERGE_DATA_VALIDATION_CHECK_ENABLED = "false"; | ||
|
|
||
|
|
||
| public static final String ORC_STRIPE_SIZE = "hoodie.hive.exec.orc.default.stripe.size"; |
There was a problem hiding this comment.
we could move these to HoodieStorageConfig along with other parquet options?
|
|
||
| public class OrcUtils { | ||
|
|
||
| public static TypeInfo getOrcField(Schema fieldSchema) throws IllegalArgumentException { |
There was a problem hiding this comment.
is this something you have written from scratch? wondering if there is no built in way like parquet-avro to aid this conversion. Guess not?
There was a problem hiding this comment.
if this code is borrowed or reused from somewhere else, then we would have to cite/credit the source
|
AFAIK @prashantwason @vingov are also interested in this. @manijndl7 are you still actively working on this? or do you mind if one of them take this across finish line/merge efforts? |
|
Yes, we are interested in ORC support. We have already started an internal project towards this end. |
|
@prashantwason please feel free to grab HUDI-57. |
|
Hi @vinothchandar , my progress is bit slow because of current project going on i would be happy if someone make it to the finish line |
|
@manijndl7 thanks for letting us know! no worries! hope to see you back soon |
|
@vinothchandar can i close this PR ? since other people will work on this. |
|
yes. agreed |