[SPARK-35725][SQL] Support optimize skewed partitions in RebalancePartitions #32883

ulysses-you · 2021-06-11T09:20:32Z

What changes were proposed in this pull request?

Add a new rule OptimizeSkewInRebalancePartitions in AQE queryStageOptimizerRules
Add a new config spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled to decide if should enable the new rule

The new rule OptimizeSkewInRebalancePartitions only handle two shuffle origin REBALANCE_PARTITIONS_BY_NONE and REBALANCE_PARTITIONS_BY_COL for data skew issue. And re-use the exists config ADVISORY_PARTITION_SIZE_IN_BYTES to decide what partition size should be.

Why are the changes needed?

Currently, we don't support expand partition dynamically in AQE which is not friendly for some data skew job.

Let's say if we have a simple query:

SELECT /*+ REBALANCE(col) */ * FROM table

The column of col is skewed, then some shuffle partitions would handle too much data than others.

If we haven't inroduced extra shuffle, we can optimize this case by expanding partitions in AQE.

Does this PR introduce any user-facing change?

Yes, a new config

How was this patch tested?

Add test

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala

ulysses-you · 2021-06-11T09:21:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

-      None
-    }
-  }
-


these code move to ShufflePartitionsUtil so that new rule ExpandShufflePartitions can use them.

ulysses-you · 2021-06-11T09:28:07Z

cc @maropu @cloud-fan @yaooqinn @wangyum @JkSelf thank you for review

SparkQA · 2021-06-11T15:06:11Z

Test build #139702 has started for PR 32883 at commit 1eef9fb.

ulysses-you · 2021-06-12T03:30:20Z

retest this please

SparkQA · 2021-06-12T04:36:39Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44251/

SparkQA · 2021-06-12T08:17:07Z

Test build #139726 has finished for PR 32883 at commit 1eef9fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-16T07:15:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ExpandShufflePartitions.scala

+ */
+object ExpandShufflePartitions extends CustomShuffleReaderRule {
+  override def supportedShuffleOrigins: Seq[ShuffleOrigin] =
+    Seq(REPARTITION_BY_COL, REPARTITION_BY_NONE)


why can we accept REPARTITION_BY_COL? If people do df.repartition($"a"), we should make sure the output is hash partitioned by column a, isn't it? even if it's the last operator like df.repartition($"a").collect

I think it only makes sense if the repartition($"a") is added by the framework to optimize table insertion, not added by users.

This makes me think that if we need a new operator to do the repartition for partitioned table insertion (non partitioned table can use the existing operator, thanks to ce16369), and assign it a new shuffle origin.

cc @wangyum

If people do df.repartition($"a"), we should make sure the output is hash partitioned by column a, isn't it

yea, we should promise this.

I think it only makes sense if the repartition($"a") is added by the framework to optimize table insertion, not added by users.

yea, we can use a new operator and shuffle origin to distinguish if it is added by user or framework. Then only optimize the operator added by framework.

The origin idea of this PR is that add a config to let user decide if repartition($"a") can expand partitions which break the semantics, so user can use it in SQL query easily. After some thought maybe it's better to add a new hint that can support expand partitions ?

We can start with the new operator first, and think of the user-facing API later. Maybe we don't need a user-facing API and the new operator can only be used by the table insertion optimizer rule.

@cloud-fan created #32932. in order to show the usage, that PR added a new hint.

Do we try to improve this case?

CREATE TABLE t1 USING parquet PARTITIONED BY (p2) AS (SELECT id, id % 1000 AS p1, if(id < 3000000, 1, id % 100) AS p2 FROM range(3100000) distribute by p2)

yes, here are two approach to optimize this case you give:

this PR can optimize it directly using a new config, but a potential issue is the distribute by semantics of output partitioning will be changed.

#32932 is going to use a new operator which does not guarantee the output partitioning, then we can optimize the new operator safely.

HyukjinKwon · 2021-06-23T06:26:16Z

cc @maryannxue can you review this please?

…partition

SparkQA · 2021-06-24T10:33:50Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44785/

SparkQA · 2021-06-24T10:44:14Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44785/

SparkQA · 2021-06-24T12:02:47Z

Test build #140256 has finished for PR 32883 at commit 844393e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-24T12:53:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

@@ -642,6 +642,15 @@ object SQLConf {
      .bytesConf(ByteUnit.BYTE)
      .createWithDefault(0L)

+  val ADAPTIVE_EXPAND_PARTITIONS_ENABLED =


We don't need a config, as people can exclude any optimizer rule by name.

we cann't do this, currently only AQE Optimizer support exclude rule

OK, then we need to think about the config name.

How about spark.sql.adaptive.optimizeSkewsInRebalacePartitions?

cloud-fan · 2021-06-24T12:54:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

@@ -99,6 +99,7 @@ case class AdaptiveSparkPlanExec(
    // Skew join does not handle `CustomShuffleReader` so needs to be applied first.
    OptimizeSkewedJoin,
    CoalesceShufflePartitions(context.session),
+    ExpandShufflePartitions,


shall we do it before CoalesceShufflePartitions? to follow OptimizeSkewedJoin

yea, seems better

cloud-fan · 2021-06-24T12:54:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ExpandShufflePartitions.scala

+ *                              /                              \
+ *   r0:[m0-b0, m1-b0, m2-b0], r1:[m0-b1], r2:[m1-b1], r3:[m2-b1], r4[m0-b2, m1-b2, m2-b2]
+ */
+object ExpandShufflePartitions extends CustomShuffleReaderRule {


I'd name it OptimizeSkewedRebalance

OptimizeSkewedPartitions ?

cloud-fan · 2021-06-24T12:58:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ExpandShufflePartitions.scala

+    updateShuffleReaders(plan)
+  }
+
+  override def apply(plan: SparkPlan): SparkPlan = {


I think the case we want to support is pretty simple: the root rebalance node.

override def apply(plan: SparkPlan): SparkPlan = plan match { case s: ShuffleQueryStageExec ... case _ => plan }

why do we need this limitation. people may use rebalance in some other place. A extreme case:

Generate Rebalance Aggregate(skewed)

If you want to expand the use cases of Rebalance, please mention it explicitly and discuss the relevant changes we need to make. Otherwise, let's assume Rebalance is only used to optimize file writing and do no introduce complexity for non-existing use cases.

cloud-fan · 2021-06-24T13:18:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala

+   *   CoalescedPartition for normal partition.
+   * - has `ShufflePartitionSpec` before expand that should be CoalescedPartition
+   *   We use some PartialReducerPartitionSpecs to replace CoalescedPartition for every
+   *   large partition, and use origin ShufflePartitionSpec for normal partition.


The logic looks overly complicated. Is it because we run the new rule after CoalesceShufflePartitions?

I think so, will simplify it

SparkQA · 2021-06-24T13:23:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44794/

SparkQA · 2021-06-24T13:32:07Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44794/

SparkQA · 2021-06-24T17:15:01Z

Test build #140264 has finished for PR 32883 at commit 2f55a1b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-30T06:56:00Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44942/

SparkQA · 2021-06-30T06:57:36Z

Test build #140414 has finished for PR 32883 at commit fd44620.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-30T08:44:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

@@ -98,6 +98,7 @@ case class AdaptiveSparkPlanExec(
    ReuseAdaptiveSubquery(context.subqueryCache),
    // Skew join does not handle `CustomShuffleReader` so needs to be applied first.
    OptimizeSkewedJoin,
+    OptimizeSkewedPartitions,


I'd make it more clear: OptimizeSkewInRebalancePartitions

cloud-fan · 2021-06-30T08:45:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+  val ADAPTIVE_OPTIMIZE_SKEWS_IN_REBALANCE_PARTITIONS_ENABLED =
+    buildConf("spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled")
+      .doc(s"When true and '${ADAPTIVE_EXECUTION_ENABLED.key}' is true, Spark will optimize the " +
+        "skewed shuffle partition to some small partitions according to the target size " +


skewed shuffle partitions in RebalancePartitions and split them to smaller ones ..., then we can remove the Note that, ... part.

cloud-fan · 2021-06-30T08:46:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala

+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * A rule to optimize the skewed shuffle partitions based on the map output statistics, which can


skewed shuffle partitions in RebalancePartitions

cloud-fan · 2021-06-30T08:49:02Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala

+ * and the reduce side looks like:
+ *                          (without this rule) r1[m0-b1, m1-b1, m2-b1]
+ *                              /                              \
+ *   r0:[m0-b0, m1-b0, m2-b0], r1:[m0-b1], r2:[m1-b1], r3:[m2-b1], r4[m0-b2, m1-b2, m2-b2]


I'd name them r0, r1-0, r1-1, r1-2, r2

cloud-fan · 2021-06-30T08:53:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala

+   * after split. Create a list of `PartialMapperPartitionSpec` for skewed partition and
+   * create `CoalescedPartition` for normal partition.
+   */
+  def optimizeSkewedPartitions(


If this is only called in the new rule, shall we move it to the new rule?

cloud-fan · 2021-06-30T08:55:28Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

@@ -1806,7 +1806,8 @@ class AdaptiveQueryExecSuite
  }

  test("SPARK-35650: Use local shuffle reader if can not coalesce number of partitions") {
-    withSQLConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "2") {
+    withSQLConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "2",


can we just disable partition coalesce in this test, so that we don't need to tune ADVISORY_PARTITION_SIZE_IN_BYTES which affects skew handling?

cloud-fan · 2021-06-30T08:56:16Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

+    withTempView("v") {
+      withSQLConf(
+        SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+        SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "false",


can we turn it on to make sure the new optimization works well with it?

wangyum · 2021-06-30T09:21:21Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala

+    plan match {
+      case shuffle: ShuffleQueryStageExec
+          if supportedShuffleOrigins.contains(shuffle.shuffle.shuffleOrigin) =>
+        optimizeSkewedPartitions(shuffle)
+      case _ => plan
+    }
+  }


Could we make it only works with DataWritingCommandExec and InsertIntoDataSourceExec? For example:

if (!conf.getConf(SQLConf.ADAPTIVE_OPTIMIZE_SKEWS_IN_REBALANCE_PARTITIONS_ENABLED)) { return plan } plan match { case d @ DataWritingCommandExec(_, child) if supportOptimization(d) => handleSkewed(d, child) case i @ InsertIntoDataSourceExec(child, _, _, _, _) if supportOptimization(i) => handleSkewed(i, child) case _ => plan } }

SparkQA · 2021-06-30T11:32:27Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44953/

SparkQA · 2021-06-30T12:06:33Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44953/

cloud-fan · 2021-06-30T13:18:48Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

@@ -1806,7 +1806,8 @@ class AdaptiveQueryExecSuite
  }

  test("SPARK-35650: Use local shuffle reader if can not coalesce number of partitions") {
-    withSQLConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "2") {
+    withSQLConf(SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "false",
+      SQLConf.ADAPTIVE_OPTIMIZE_SKEWS_IN_REBALANCE_PARTITIONS_ENABLED.key -> "false") {


do we need to disable this feature in this test?

not need, the default value of target size is big enough

cloud-fan · 2021-06-30T13:20:26Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

+        }
+
+        withSQLConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "100") {
+          // partition size [0,258,72,72,72]


can we tune the size a little more, so that coalesce also applies?

yea, tune up to 150

SparkQA · 2021-06-30T14:42:36Z

Test build #140445 has finished for PR 32883 at commit 8e706ae.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-30T15:07:52Z

Test build #140439 has finished for PR 32883 at commit 7666145.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-30T15:24:43Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44959/

SparkQA · 2021-06-30T15:58:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44959/

cloud-fan · 2021-06-30T18:04:47Z

thanks, merging to master!

ulysses-you · 2021-07-01T01:34:30Z

thank you all !

…titions ### What changes were proposed in this pull request? * Add a new rule `ExpandShufflePartitions` in AQE `queryStageOptimizerRules` * Add a new config `spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled` to decide if should enable the new rule The new rule `OptimizeSkewInRebalancePartitions` only handle two shuffle origin `REBALANCE_PARTITIONS_BY_NONE` and `REBALANCE_PARTITIONS_BY_COL` for data skew issue. And re-use the exists config `ADVISORY_PARTITION_SIZE_IN_BYTES` to decide what partition size should be. ### Why are the changes needed? Currently, we don't support expand partition dynamically in AQE which is not friendly for some data skew job. Let's say if we have a simple query: ``` SELECT /*+ REBALANCE(col) */ * FROM table ``` The column of `col` is skewed, then some shuffle partitions would handle too much data than others. If we haven't inroduced extra shuffle, we can optimize this case by expanding partitions in AQE. ### Does this PR introduce _any_ user-facing change? Yes, a new config ### How was this patch tested? Add test Closes apache#32883 from ulysses-you/expand-partition. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

ulysses-you added 2 commits June 11, 2021 17:10

Support repartition expand partitions in AQE

caa9052

nit

af9d30e

github-actions bot added the SQL label Jun 11, 2021

ulysses-you commented Jun 11, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala Outdated Show resolved Hide resolved

ulysses-you commented Jun 11, 2021

View reviewed changes

nit

1eef9fb

cloud-fan reviewed Jun 16, 2021

View reviewed changes

ulysses-you mentioned this pull request Jun 23, 2021

[SPARK-35786][SQL] Add a new operator to rebalance the query output if AQE is enabled #32932

Closed

ulysses-you added 3 commits June 24, 2021 17:28

Merge branch 'master' of https://github.com/apache/spark into expand-…

5a02dec

…partition

REBALANCE

759988d

REBALANCE

844393e

fix

2f55a1b

cloud-fan reviewed Jun 24, 2021

View reviewed changes

name

b5ee54c

cloud-fan reviewed Jun 30, 2021

View reviewed changes

wangyum reviewed Jun 30, 2021

View reviewed changes

ulysses-you added 2 commits June 30, 2021 17:25

rebalance

ef7a00d

test

7666145

cloud-fan reviewed Jun 30, 2021

View reviewed changes

tet

8e706ae

cloud-fan approved these changes Jun 30, 2021

View reviewed changes

cloud-fan closed this in d46c1e3 Jun 30, 2021

ulysses-you deleted the expand-partition branch July 1, 2021 01:34

[SPARK-35725][SQL] Support optimize skewed partitions in RebalancePartitions #32883

[SPARK-35725][SQL] Support optimize skewed partitions in RebalancePartitions #32883

Conversation

ulysses-you commented Jun 11, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Choose a reason for hiding this comment

ulysses-you commented Jun 11, 2021

SparkQA commented Jun 11, 2021

ulysses-you commented Jun 12, 2021

SparkQA commented Jun 12, 2021

SparkQA commented Jun 12, 2021

cloud-fan Jun 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HyukjinKwon commented Jun 23, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 30, 2021

SparkQA commented Jun 30, 2021

Choose a reason for hiding this comment

cloud-fan Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 30, 2021

SparkQA commented Jun 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 30, 2021

SparkQA commented Jun 30, 2021

SparkQA commented Jun 30, 2021

SparkQA commented Jun 30, 2021

cloud-fan commented Jun 30, 2021

ulysses-you commented Jul 1, 2021

ulysses-you commented Jun 11, 2021 •

edited

Loading

cloud-fan Jun 16, 2021 •

edited

Loading

cloud-fan Jun 30, 2021 •

edited

Loading

cloud-fan Jun 30, 2021 •

edited

Loading