[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort #8011

andrewor14 · 2015-08-06T22:33:55Z

The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an IOException("Unable to acquire X bytes of memory") thrown by UnsafeExternalSorter.

The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions.

The MapPartitionsWithPreparationRDDSuite simulates the condition we are trying to fix, which is that the child can acquire memory before the parent.

…emory

andrewor14 · 2015-08-06T22:35:45Z

core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala

@@ -124,7 +124,7 @@ private[spark] class ShuffleMemoryManager(maxMemory: Long) extends Logging {
  }
 }

-private object ShuffleMemoryManager {
+private[spark] object ShuffleMemoryManager {


used in tests

andrewor14 · 2015-08-06T22:44:25Z

@rxin @JoshRosen

rxin · 2015-08-06T23:00:11Z

High level approach looks good.

Would be great to simplify the test case to not rely on memory management. Just use an atomicinteger somewhere.

SparkQA · 2015-08-07T01:06:23Z

Test build #40091 has finished for PR 8011 at commit 5d5afdf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-08-07T01:23:56Z

LGTM

andrewor14 · 2015-08-07T01:42:39Z

retest this please

SparkQA · 2015-08-07T01:43:32Z

Test build #40093 timed out for PR 8011 at commit 0b07782 after a configured wait of 175m.

SparkQA · 2015-08-07T01:57:57Z

Test build #40103 has finished for PR 8011 at commit db8b6e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an `IOException("Unable to acquire X bytes of memory")` thrown by `UnsafeExternalSorter`. The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions. Author: Andrew Or <andrew@databricks.com> Closes #8011 from andrewor14/unsafe-starve-memory and squashes the following commits: 35b69a4 [Andrew Or] Simplify test 0b07782 [Andrew Or] Minor: update comments 5d5afdf [Andrew Or] Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory 254032e [Andrew Or] Add tests 234acbd [Andrew Or] Reserve a page in sorter when preparing each partition b889e08 [Andrew Or] MapPartitionsWithPreparationRDD (cherry picked from commit 014a9f9) Signed-off-by: Reynold Xin <rxin@databricks.com>

SparkQA · 2015-08-07T02:25:03Z

Test build #40107 has finished for PR 8011 at commit 35b69a4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-07T04:07:50Z

Test build #1397 has finished for PR 8011 at commit 35b69a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-07T04:25:34Z

Test build #1396 has finished for PR 8011 at commit 35b69a4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-07T04:38:29Z

Test build #40124 timed out for PR 8011 at commit 35b69a4 after a configured wait of 175m.

This is the sister patch to #8011, but for aggregation. In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance. rxin yhuai Author: Andrew Or <andrew@databricks.com> Closes #8038 from andrewor14/unsafe-starve-memory-agg. (cherry picked from commit e011079) Signed-off-by: Reynold Xin <rxin@databricks.com>

This is the sister patch to #8011, but for aggregation. In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance. rxin yhuai Author: Andrew Or <andrew@databricks.com> Closes #8038 from andrewor14/unsafe-starve-memory-agg.

This is the sister patch to apache#8011, but for aggregation. In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance. rxin yhuai Author: Andrew Or <andrew@databricks.com> Closes apache#8038 from andrewor14/unsafe-starve-memory-agg.

Since we do not need to preserve a page before calling compute(), MapPartitionsWithPreparationRDD is not needed anymore. This PR basically revert #8543, #8511, #8038, #8011 Author: Davies Liu <davies@databricks.com> Closes #9381 from davies/remove_prepare2.

Andrew Or added 4 commits August 6, 2015 12:05

MapPartitionsWithPreparationRDD

b889e08

Reserve a page in sorter when preparing each partition

234acbd

Add tests

254032e

The MapPartitionsWithPreparationRDDSuite simulates the condition we are trying to fix, which is that the child can acquire memory before the parent.

Merge branch 'master' of github.com:apache/spark into unsafe-starve-m…

5d5afdf

…emory

andrewor14 force-pushed the unsafe-starve-memory branch from fdd3c92 to 5d5afdf Compare August 6, 2015 22:35

andrewor14 reviewed Aug 6, 2015
View reviewed changes

Minor: update comments

0b07782

andrewor14 force-pushed the unsafe-starve-memory branch from 7c34b09 to 0b07782 Compare August 6, 2015 22:38

Simplify test

35b69a4

andrewor14 force-pushed the unsafe-starve-memory branch from db8b6e0 to 35b69a4 Compare August 6, 2015 23:39

asfgit closed this in 014a9f9 Aug 7, 2015

andrewor14 deleted the unsafe-starve-memory branch August 7, 2015 04:04

andrewor14 mentioned this pull request Aug 7, 2015

[SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation #8038

Closed

davies mentioned this pull request Oct 30, 2015

[SPARK-11423] remove MapPartitionsWithPreparationRDD #9381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort #8011

[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort #8011

andrewor14 commented Aug 6, 2015

andrewor14 Aug 6, 2015

andrewor14 commented Aug 6, 2015

rxin commented Aug 6, 2015

SparkQA commented Aug 7, 2015

rxin commented Aug 7, 2015

andrewor14 commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort #8011

[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort #8011

Conversation

andrewor14 commented Aug 6, 2015

andrewor14 Aug 6, 2015

Choose a reason for hiding this comment

andrewor14 commented Aug 6, 2015

rxin commented Aug 6, 2015

SparkQA commented Aug 7, 2015

rxin commented Aug 7, 2015

andrewor14 commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015