[SPARK-32712][SQL] Support writing Hive bucketed table (Hive file formats with Hive hash) by c21 · Pull Request #34103 · apache/spark

c21 · 2021-09-25T01:00:34Z

What changes were proposed in this pull request?

This is to support writing Hive bucketed table with Hive file formats (the code path for Hive table write - InsertIntoHiveTable). The bucketed table is partitioned with Hive hash, same as Hive, Presto and Trino.

Why are the changes needed?

To make Spark write other-SQL-engines-compatible bucketed table. Same motivation as #33432 .

Does this PR introduce any user-facing change?

Yes. Before this PR, writing to these Hive bucketed table would throw an exception in Spark if config "hive.enforce.bucketing" or "hive.enforce.sorting" set to true. After this PR, writing to these Hive bucketed table would succeed. The table can be read back by Presto and Trino efficiently as other Hive bucketed table.

How was this patch tested?

Modified unit test in BucketedWriteWithHiveSupportSuite.scala, to verify bucket file names and each row in each bucket is written properly, for Hive write code path as well.

c21 · 2021-09-25T01:02:30Z

sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala

-            df,
-            bucketIdExpression,
-            getBucketIdFromFileName)
+            withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") {


This is added as Hive write code path enforces it - https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L161 .

SparkQA · 2021-09-25T01:58:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48129/

SparkQA · 2021-09-25T02:18:15Z

Test build #143617 has finished for PR 34103 at commit cb6b5b1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-09-25T02:58:47Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48129/

SparkQA · 2021-09-26T00:21:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48138/

SparkQA · 2021-09-26T01:09:33Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48138/

SparkQA · 2021-09-26T01:37:22Z

Test build #143626 has finished for PR 34103 at commit 12a8aca.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

c21 · 2021-09-27T04:34:26Z

@cloud-fan - could you help take a look when you have time? Thanks.

cloud-fan · 2021-09-27T07:58:40Z

thanks, merging to master!

c21 · 2021-09-28T18:22:35Z

Thank you @cloud-fan for review!

Support to write Hive bucketed table (Hive serde file format)

cb6b5b1

github-actions bot added the SQL label Sep 25, 2021

c21 commented Sep 25, 2021

View reviewed changes

c21 changed the title ~~[SPARK-32712][SQL] Support to write Hive bucketed table (Hive file formats with Hive hash)~~ [SPARK-32712][SQL] Support writing Hive bucketed table (Hive file formats with Hive hash) Sep 25, 2021

Fix unit test in InsertSuite.scala

12a8aca

cloud-fan approved these changes Sep 27, 2021

View reviewed changes

cloud-fan closed this in 978a915 Sep 27, 2021

c21 deleted the hive-bucket branch October 4, 2021 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32712][SQL] Support writing Hive bucketed table (Hive file formats with Hive hash)#34103

[SPARK-32712][SQL] Support writing Hive bucketed table (Hive file formats with Hive hash)#34103
c21 wants to merge 2 commits intoapache:masterfrom
c21:hive-bucket

c21 commented Sep 25, 2021 •

edited

Loading

Uh oh!

c21 Sep 25, 2021

Uh oh!

SparkQA commented Sep 25, 2021

Uh oh!

SparkQA commented Sep 25, 2021

Uh oh!

SparkQA commented Sep 25, 2021

Uh oh!

SparkQA commented Sep 26, 2021

Uh oh!

SparkQA commented Sep 26, 2021

Uh oh!

SparkQA commented Sep 26, 2021

Uh oh!

c21 commented Sep 27, 2021

Uh oh!

cloud-fan commented Sep 27, 2021

Uh oh!

c21 commented Sep 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

c21 commented Sep 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

c21 Sep 25, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 25, 2021

Uh oh!

SparkQA commented Sep 25, 2021

Uh oh!

SparkQA commented Sep 25, 2021

Uh oh!

SparkQA commented Sep 26, 2021

Uh oh!

SparkQA commented Sep 26, 2021

Uh oh!

SparkQA commented Sep 26, 2021

Uh oh!

c21 commented Sep 27, 2021

Uh oh!

cloud-fan commented Sep 27, 2021

Uh oh!

c21 commented Sep 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

c21 commented Sep 25, 2021 •

edited

Loading