-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29657][CORE] Iterator spill supporting radix sort with null prefix #26323
Conversation
spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java Lines 364 to 376 in 44a27bd
spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java Lines 519 to 524 in 44a27bd
The following is a log of an error we encountered in the production environment. [Executor task launch worker for task 66055] INFO TaskMemoryManager: Memory used in task 66055 |
@JoshRosen @davies |
@cloud-fan Can you look at this question? Thank you. |
@wangyum @HyukjinKwon @srowen |
I don't know enough to evaluate the change. It looks like it touches some important code. |
OK to test |
@@ -518,19 +520,52 @@ public int getNumRecords() { | |||
|
|||
public long spill() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
low-level code is hard to review as nobody remembers all the details all the time. Can you briefly explain the spill process to help people review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before SPARK-14851, UnsafeInMemorySorter.getSortedIterator()
, the returned type is UnsafeInMemorySorter.SortedIterator
.
Now, when the radio sort is turned on and there are some empty values, the return type is UnsafeExternalSorter.ChainedIterator(UnsafeInMemorySorter.SortedIterator)
.
In UnsafeExternalSorter.SpillableIterator#spill
, this situation cannot be spilled.
When UnsafeExternalSorter#getSortedIterator
is called, part of the last written data is saved in memory.
When the task needs more execution memory, the spill fails.
TaskMemoryManager#acquireExecutionMemory->UnsafeExternalSorter#spill->UnsafeExternalSorter.SpillableIterator#spill
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Support
ChainedIterator
withSortedIterator
spillWhy are the changes needed?
In the case of radix sort, when the
insertRecord
part of the keyPrefix is null, the iterator type returned bygetSortedIterator
isChainedIterator
.Currently
ChainedIterator
does not support spill, causingUnsafeExternalSorter
to take up a lot of execution memory,allocatePage
fails, throwSparkOutOfMemoryError
Unable to acquire xxx bytes of memory, got 0Does this PR introduce any user-facing change?
No
How was this patch tested?
add ut