Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort #8011

Closed
wants to merge 6 commits into from

Conversation

andrewor14
Copy link
Contributor

The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an IOException("Unable to acquire X bytes of memory") thrown by UnsafeExternalSorter.

The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions.

Andrew Or added 4 commits August 6, 2015 12:05
The MapPartitionsWithPreparationRDDSuite simulates the condition
we are trying to fix, which is that the child can acquire memory
before the parent.
@@ -124,7 +124,7 @@ private[spark] class ShuffleMemoryManager(maxMemory: Long) extends Logging {
}
}

private object ShuffleMemoryManager {
private[spark] object ShuffleMemoryManager {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used in tests

@andrewor14
Copy link
Contributor Author

@rxin @JoshRosen

@rxin
Copy link
Contributor

rxin commented Aug 6, 2015

High level approach looks good.

Would be great to simplify the test case to not rely on memory management. Just use an atomicinteger somewhere.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40091 has finished for PR 8011 at commit 5d5afdf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 7, 2015

LGTM

@andrewor14
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40093 timed out for PR 8011 at commit 0b07782 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40103 has finished for PR 8011 at commit db8b6e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 7, 2015
The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an `IOException("Unable to acquire X bytes of memory")` thrown by `UnsafeExternalSorter`.

The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions.

Author: Andrew Or <andrew@databricks.com>

Closes #8011 from andrewor14/unsafe-starve-memory and squashes the following commits:

35b69a4 [Andrew Or] Simplify test
0b07782 [Andrew Or] Minor: update comments
5d5afdf [Andrew Or] Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory
254032e [Andrew Or] Add tests
234acbd [Andrew Or] Reserve a page in sorter when preparing each partition
b889e08 [Andrew Or] MapPartitionsWithPreparationRDD

(cherry picked from commit 014a9f9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 014a9f9 Aug 7, 2015
@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40107 has finished for PR 8011 at commit 35b69a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14 andrewor14 deleted the unsafe-starve-memory branch August 7, 2015 04:04
@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #1397 has finished for PR 8011 at commit 35b69a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #1396 has finished for PR 8011 at commit 35b69a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40124 timed out for PR 8011 at commit 35b69a4 after a configured wait of 175m.

asfgit pushed a commit that referenced this pull request Aug 12, 2015
This is the sister patch to #8011, but for aggregation.

In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

rxin yhuai

Author: Andrew Or <andrew@databricks.com>

Closes #8038 from andrewor14/unsafe-starve-memory-agg.

(cherry picked from commit e011079)
Signed-off-by: Reynold Xin <rxin@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 12, 2015
This is the sister patch to #8011, but for aggregation.

In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

rxin yhuai

Author: Andrew Or <andrew@databricks.com>

Closes #8038 from andrewor14/unsafe-starve-memory-agg.
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
This is the sister patch to apache#8011, but for aggregation.

In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

rxin yhuai

Author: Andrew Or <andrew@databricks.com>

Closes apache#8038 from andrewor14/unsafe-starve-memory-agg.
asfgit pushed a commit that referenced this pull request Oct 30, 2015
Since we do not need to preserve a page before calling compute(), MapPartitionsWithPreparationRDD is not needed anymore.

This PR basically revert #8543, #8511, #8038, #8011

Author: Davies Liu <davies@databricks.com>

Closes #9381 from davies/remove_prepare2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants