-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45664][SQL] Introduce a mapper for orc compression codecs #43528
Conversation
44f3030
to
b349200
Compare
ping @dongjoon-hyun cc @srowen @viirya |
00b7b83
to
655fbeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. May I ask why we do this ORC-specific change? Are you going to do the same things for all data sources like Parquet and Avro at Apache Spark 4.0.0?
Because orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs I'm doing the same things for Parquet and Avro at Apache Spark 4.0.0. |
3258d65
to
a1b8ddd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you for keeping Apache Spark consistency.
I'm doing the same things for Parquet and Avro at Apache Spark 4.0.0.
@dongjoon-hyun Thank you! |
…rings copy from parquet|orc|avro compression codes ### What changes were proposed in this pull request? This PR follows up #43562, #43528 and #43308. The aim of this PR is to avoid magic strings copy from `parquet|orc|avro` compression codes. This PR also simplify some test cases. ### Why are the changes needed? Avoid magic strings copy from parquet|orc|avro compression codes ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #43604 from beliefer/parquet_orc_avro. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
Currently, Spark supported all the orc compression codecs, but the orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs
NONE
andUNCOMPRESSED
.On the other hand, there are a lot of magic strings copy from orc compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency.
Why are the changes needed?
Let developers easy to use orc compression codecs.
Does this PR introduce any user-facing change?
'No'.
Introduce a new class.
How was this patch tested?
Exists test cases.
Was this patch authored or co-authored using generative AI tooling?
'No'.