[SPARK-45664][SQL] Introduce a mapper for orc compression codecs #43528

beliefer · 2023-10-25T09:27:43Z

What changes were proposed in this pull request?

Currently, Spark supported all the orc compression codecs, but the orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs NONE and UNCOMPRESSED.

On the other hand, there are a lot of magic strings copy from orc compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency.

Why are the changes needed?

Let developers easy to use orc compression codecs.

Does this PR introduce any user-facing change?

'No'.
Introduce a new class.

How was this patch tested?

Exists test cases.

Was this patch authored or co-authored using generative AI tooling?

'No'.

beliefer · 2023-10-25T09:35:39Z

ping @dongjoon-hyun cc @srowen @viirya

dongjoon-hyun

Just a question. May I ask why we do this ORC-specific change? Are you going to do the same things for all data sources like Parquet and Avro at Apache Spark 4.0.0?

beliefer · 2023-10-27T01:35:42Z

Just a question. May I ask why we do this ORC-specific change? Are you going to do the same things for all data sources like Parquet and Avro at Apache Spark 4.0.0?

Because orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs NONE and UNCOMPRESSED. This change also make tests easy and reduce the magic strings.

I'm doing the same things for Parquet and Avro at Apache Spark 4.0.0.

dongjoon-hyun

Got it. Thank you for keeping Apache Spark consistency.

I'm doing the same things for Parquet and Avro at Apache Spark 4.0.0.

beliefer · 2023-10-30T10:11:21Z

@dongjoon-hyun Thank you!

…rings copy from parquet|orc|avro compression codes ### What changes were proposed in this pull request? This PR follows up #43562, #43528 and #43308. The aim of this PR is to avoid magic strings copy from `parquet|orc|avro` compression codes. This PR also simplify some test cases. ### Why are the changes needed? Avoid magic strings copy from parquet|orc|avro compression codes ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #43604 from beliefer/parquet_orc_avro. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

beliefer requested review from dongjoon-hyun and viirya October 25, 2023 09:27

github-actions bot added the SQL label Oct 25, 2023

beliefer force-pushed the SPARK-45664 branch 2 times, most recently from 44f3030 to b349200 Compare October 25, 2023 09:33

beliefer force-pushed the SPARK-45664 branch 2 times, most recently from 00b7b83 to 655fbeb Compare October 26, 2023 03:37

dongjoon-hyun reviewed Oct 26, 2023

View reviewed changes

beliefer force-pushed the SPARK-45664 branch 2 times, most recently from 3258d65 to a1b8ddd Compare October 28, 2023 11:19

[SPARK-45664][SQL] Introduce a mapper for orc compression codecs

29b0176

beliefer force-pushed the SPARK-45664 branch from a1b8ddd to 29b0176 Compare October 29, 2023 01:59

dongjoon-hyun approved these changes Oct 30, 2023

View reviewed changes

dongjoon-hyun closed this in f12bc05 Oct 30, 2023

beliefer mentioned this pull request Oct 31, 2023

[SPARK-45481][SPARK-45664][SPARK-45711][SQL][FOLLOWUP] Avoid magic strings copy from parquet|orc|avro compression codes #43604

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45664][SQL] Introduce a mapper for orc compression codecs #43528

[SPARK-45664][SQL] Introduce a mapper for orc compression codecs #43528

beliefer commented Oct 25, 2023

beliefer commented Oct 25, 2023

dongjoon-hyun left a comment

beliefer commented Oct 27, 2023

dongjoon-hyun left a comment

beliefer commented Oct 30, 2023

[SPARK-45664][SQL] Introduce a mapper for orc compression codecs #43528

[SPARK-45664][SQL] Introduce a mapper for orc compression codecs #43528

Conversation

beliefer commented Oct 25, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

beliefer commented Oct 25, 2023

dongjoon-hyun left a comment

Choose a reason for hiding this comment

beliefer commented Oct 27, 2023

dongjoon-hyun left a comment

Choose a reason for hiding this comment

beliefer commented Oct 30, 2023