[SPARK-19256][SQL] Remove ordering enforcement from `FileFormatWriter` and let planner do that #20206

tejasapatil · 2018-01-09T16:14:17Z

What changes were proposed in this pull request?

This is as per discussion in #19483 (comment) . Enforcing Sort at right places in the tree is something that EnsureRequirements should take care of. This PR removes SORT node insertion done inside FileFormatWriter.

How was this patch tested?

Existing unit tests
Looked at the query plan for bucketed insert. Sort was added in the plan.

scala> hc.sql(" desc formatted test1  ").collect.foreach(println)
.....
[Num Buckets,8,]
[Bucket Columns,[`j`, `k`],]
[Sort Columns,[`j`, `k`],]


scala> hc.sql(" EXPLAIN INSERT OVERWRITE TABLE test1 SELECT * FROM test2 ").collect.foreach(println)
[== Physical Plan ==
Execute InsertIntoHadoopFsRelationCommand InsertIntoHadoopFsRelationCommand file:/warehouse/test1, false, 8 buckets, bucket columns: [j, k], sort columns: [j, k], ...
+- *Sort [pmod(hash(j#56, k#57, 42), 8) ASC NULLS FIRST, j#56 ASC NULLS FIRST, k#57 ASC NULLS FIRST], false, 0
   +- *FileScan orc default.test2[i#55,j#56,k#57] Batched: false, Format: ORC, Location: InMemoryFileIndex[file:/warehouse/test2], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<i:int,j:int,k:string>]

…r` into `RunnableCommand`

SparkQA · 2018-01-09T18:18:13Z

Test build #85859 has finished for PR 20206 at commit 652dca2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-10T03:27:26Z

Test build #85891 has finished for PR 20206 at commit 1008b2e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…itions

SparkQA · 2018-01-11T02:15:43Z

Test build #85936 has finished for PR 20206 at commit 8c91ff9.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tejasapatil · 2018-01-11T02:49:56Z

Jenkins retest this please

SparkQA · 2018-01-11T06:05:48Z

Test build #85946 has finished for PR 20206 at commit 8c91ff9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tejasapatil · 2018-01-11T16:28:32Z

cc @cloud-fan @gengliangwang for review

cloud-fan · 2018-01-15T11:41:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala

-      } else {
-        // SPARK-21165: the `requiredOrdering` is based on the attributes from analyzed plan, and
-        // the physical plan may have different attribute ids due to optimizer removing some
-        // aliases. Here we bind the expression ahead to avoid potential attribute ids mismatch.


This concern is still valid, the DataWritingCommand.requiredChildOrdering is based on logical plan's output attribute ids, how can we safely apply it in DataWritingCommandExec?

cloud-fan · 2018-01-15T11:45:01Z

...ain/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala

+   *
+   *     table type   |            requiredOrdering
+   * -----------------+-------------------------------------------------
+   *   normal table   |             partition columns


nit: non-bucketed table, a partitioned table is not a normal table...

cloud-fan · 2018-01-15T11:48:03Z

...ain/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala

@@ -150,6 +152,10 @@ case class InsertIntoHadoopFsRelationCommand(
        }
      }

+      val partitionSet = AttributeSet(partitionColumns)
+      val dataColumns = query.output.filterNot(partitionSet.contains)


We should use outputColumns instead of query.output, cc @gengliangwang

+1, it should be outputColumns here, which is the output columns of analyzed plan. See #20020 for details.

HyukjinKwon · 2018-07-16T03:04:43Z

Hi all, any update here?

github-actions · 2020-01-14T00:05:50Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

[SPARK-19256][SQL] Move bucketing constraints out of `FileFormatWrite…

652dca2

…r` into `RunnableCommand`

override requiredOrdering for InsertIntoHiveTable

1008b2e

requiredOrdering for InsertIntoHiveTable to only consider static part…

8c91ff9

…itions

tejasapatil mentioned this pull request Jan 11, 2018

[SPARK-19256][SQL] Hive bucketing support #19001

Closed

cloud-fan reviewed Jan 15, 2018

View reviewed changes

dongjoon-hyun added the SQL label Jun 14, 2019

github-actions bot added the Stale label Jan 14, 2020

github-actions bot closed this Jan 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19256][SQL] Remove ordering enforcement from `FileFormatWriter` and let planner do that #20206

[SPARK-19256][SQL] Remove ordering enforcement from `FileFormatWriter` and let planner do that #20206

tejasapatil commented Jan 9, 2018 •

edited

Loading

SparkQA commented Jan 9, 2018

SparkQA commented Jan 10, 2018

SparkQA commented Jan 11, 2018

tejasapatil commented Jan 11, 2018

SparkQA commented Jan 11, 2018

tejasapatil commented Jan 11, 2018

cloud-fan Jan 15, 2018

cloud-fan Jan 15, 2018

cloud-fan Jan 15, 2018

gengliangwang Jan 15, 2018

HyukjinKwon commented Jul 16, 2018

github-actions bot commented Jan 14, 2020

[SPARK-19256][SQL] Remove ordering enforcement from FileFormatWriter and let planner do that #20206

[SPARK-19256][SQL] Remove ordering enforcement from FileFormatWriter and let planner do that #20206

Conversation

tejasapatil commented Jan 9, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 9, 2018

SparkQA commented Jan 10, 2018

SparkQA commented Jan 11, 2018

tejasapatil commented Jan 11, 2018

SparkQA commented Jan 11, 2018

tejasapatil commented Jan 11, 2018

cloud-fan Jan 15, 2018

Choose a reason for hiding this comment

cloud-fan Jan 15, 2018

Choose a reason for hiding this comment

cloud-fan Jan 15, 2018

Choose a reason for hiding this comment

gengliangwang Jan 15, 2018

Choose a reason for hiding this comment

HyukjinKwon commented Jul 16, 2018

github-actions bot commented Jan 14, 2020

[SPARK-19256][SQL] Remove ordering enforcement from `FileFormatWriter` and let planner do that #20206

[SPARK-19256][SQL] Remove ordering enforcement from `FileFormatWriter` and let planner do that #20206

tejasapatil commented Jan 9, 2018 •

edited

Loading