-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10341] [SQL] fix memory starving in unsafe SMJ #8511
Conversation
ping @JoshRosen |
Test build #41759 has finished for PR 8511 at commit
|
Test build #1703 has finished for PR 8511 at commit
|
Test build #1704 has finished for PR 8511 at commit
|
Test build #1705 has finished for PR 8511 at commit
|
/cc @andrewor14 |
@@ -24,6 +24,8 @@ import org.apache.spark.{Partition, Partitioner, TaskContext} | |||
/** | |||
* An RDD that applies a user provided function to every partition of the parent RDD, and | |||
* additionally allows the user to prepare each partition before computing the parent partition. | |||
* | |||
* TODO(davies): remove this once SPARK-10342 is fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this comment, it is not clear what need to be removed (remove this class or remove changes of this PR). Can you add a comment in SPARK-10342 about what need to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also unclear we want to remove this even with better memory
management. You might still want this with those. I would just remove this
comment.
On Aug 29, 2015, at 1:37 PM, Yin Huai notifications@github.com wrote:
In
core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithPreparationRDD.scala
#8511 (comment):
@@ -24,6 +24,8 @@ import org.apache.spark.{Partition, Partitioner, TaskContext}
/**
- An RDD that applies a user provided function to every partition of the parent RDD, and
- additionally allows the user to prepare each partition before computing the parent partition.
- * TODO(davies): remove this once SPARK-10342 is fixed
Based on this comment, it is not clear what need to be removed (remove this
class or remove changes of this PR). Can you add a comment in SPARK-10342
about what need to be removed?
—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/8511/files#r38265746.
Test build #41785 has finished for PR 8511 at commit
|
val preparedArgument = preparePartition() | ||
prepare() | ||
// The same RDD could be called multiple times in one task, each call of compute() should | ||
// have sep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete comment?
retest this please |
@@ -73,6 +73,13 @@ private[spark] abstract class ZippedPartitionsBaseRDD[V: ClassTag]( | |||
super.clearDependencies() | |||
rdds = null | |||
} | |||
|
|||
protected def tryPrepareChildren() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit
return type. Also can you add a java doc to this:
/**
* Call the prepare method of every children that has one.
* This is needed for reserving execution memory in advance.
*/
@davies I think the changes here fix the problem, but I find it a little arbitrary that
This essentially forces SMJ to call the Will something like this work? |
@andrewor14 Not only SortMergeJoin has this problem, SortMergeOuterJoin also has it. And, any join that depends on TungstenAggregation or TungstenSort will have this problem. I think ZipPartitionRDDs is the only place that could introduce multiple MapPartitionsWithPrepareRDD in the same task. Correct me, if there are others. |
Test build #41797 has finished for PR 8511 at commit
|
Test build #41803 has finished for PR 8511 at commit
|
Test build #1706 has finished for PR 8511 at commit
|
Hm, my only concern with doing the |
@andrewor14 I think getNarrowAncestors will not traverse the entire lineage, it only visit the RDDs within the current stage. It still not cheap if the RDD is an union of thousands RDDs. We could rollback to use |
Test build #41838 has finished for PR 8511 at commit
|
LGTM will merge once tests pass. |
I'm going to optimistically merge this since most of the tests passed. |
In SMJ, the first ExternalSorter could consume all the memory before spilling, then the second can not even acquire the first page. Before we have a better memory allocator, SMJ should call prepare() before call any compute() of it's children. cc rxin JoshRosen Author: Davies Liu <davies@databricks.com> Closes #8511 from davies/smj_memory. (cherry picked from commit 540bdee) Signed-off-by: Reynold Xin <rxin@databricks.com>
@@ -38,12 +39,28 @@ private[spark] class MapPartitionsWithPreparationRDD[U: ClassTag, T: ClassTag, M | |||
|
|||
override def getPartitions: Array[Partition] = firstParent[T].partitions | |||
|
|||
// In certain join operations, prepare can be called on the same partition multiple times. | |||
// In this case, we need to ensure that each call to compute gets a separate prepare argument. | |||
private[this] var preparedArguments: ArrayBuffer[M] = new ArrayBuffer[M] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this can be a val. We can fix this in a follow-up patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this a follow up PR.
Test build #41839 has finished for PR 8511 at commit
|
In SMJ, the first ExternalSorter could consume all the memory before spilling, then the second can not even acquire the first page.
Before we have a better memory allocator, SMJ should call prepare() before call any compute() of it's children.
cc @rxin @JoshRosen