Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExternalSorter #29785

Closed
wants to merge 4 commits into from

Conversation

tomvanbussel
Copy link
Contributor

@tomvanbussel tomvanbussel commented Sep 17, 2020

What changes were proposed in this pull request?

This PR changes UnsafeExternalSorter to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in UnsafeInMemorySorter. Instead the new pointer array is allocated whenever the next record is inserted into the sorter.

Why are the changes needed?

Without this change the UnsafeExternalSorter could throw an OOM while spilling. The following sequence of events would have triggered an OOM:

  1. UnsafeExternalSorter runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
  2. TaskMemoryManager tries to allocate the memory backing the new large array using MemoryManager, but MemoryManager is only willing to return most but not all of the memory requested.
  3. TaskMemoryManager asks UnsafeExternalSorter to spill, which causes UnsafeExternalSorter to spill the current run to disk, to free its record pages and to reset its UnsafeInMemorySorter.
  4. UnsafeInMemorySorter frees the old pointer array, and tries to allocate a new small pointer array.
  5. TaskMemoryManager tries to allocate the memory backing the small array using MemoryManager, but MemoryManager is unwilling to give it any memory, as the TaskMemoryManager is still holding on to the memory it got for the new large array.
  6. TaskMemoryManager again asks UnsafeExternalSorter to spill, but this time there is nothing to spill.
  7. UnsafeInMemorySorter receives less memory than it requested, and causes a SparkOutOfMemoryError to be thrown, which causes the current task to fail.

With the changes in the PR the following will happen instead:

  1. UnsafeExternalSorter runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
  2. TaskMemoryManager tries to allocate the memory backing the new large array using MemoryManager, but MemoryManager is only willing to return most but not all of the memory requested.
  3. TaskMemoryManager asks UnsafeExternalSorter to spill, which causes UnsafeExternalSorter to spill the current run to disk, to free its record pages and to reset its UnsafeInMemorySorter.
  4. UnsafeInMemorySorter frees the old pointer array.
  5. TaskMemoryManager returns control to UnsafeExternalSorter.growPointerArrayIfNecessary (either by returning the the new large array or by throwing a SparkOutOfMemoryError).
  6. UnsafeExternalSorter either frees the new large array or it ignores the SparkOutOfMemoryError depending on what happened in the previous step.
  7. UnsafeExternalSorter successfully allocates a new small pointer array and operation continues as normal.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tests were added in UnsafeExternalSorterSuite and UnsafeInMemorySorterSuite.

@tomvanbussel
Copy link
Contributor Author

cc @hvanhovell @cloud-fan

@cloud-fan
Copy link
Contributor

OK to test

@cloud-fan
Copy link
Contributor

I vaguely remember that, the memory manager won't ask a memory consumer to spill when the memory consumer is requesting memory right now. @tomvanbussel can you double-check the related code?

@tomvanbussel
Copy link
Contributor Author

@cloud-fan I checked the code (TaskMemoryManager.acquireExecutionMemory()) and it seems that the memory will first ask the other memory consumers to spill before it asks the memory consumer that requested the memory to spill. But it does ask the requesting memory consumer to spill if necessary.

// checkstyle.on: RegexpSinglelineJava
if (array != null) {
if (newArray.size() < array.size()) {
// checkstyle.off: RegexpSinglelineJava
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clue. The lines with the comments are newer than the line between them, so I'm sure they're good for something...

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PR's description no no longer allocate. Double no compensate each other? ;-)

@tomvanbussel
Copy link
Contributor Author

Fixed, thanks Max ;)

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 20, 2020

Test build #128899 has finished for PR 29785 at commit a74aa0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@tomvanbussel thanks for putting the details steps about how to trigger the bug in the PR description! Can you also put the detailed steps after the fix?

@tomvanbussel
Copy link
Contributor Author

@cloud-fan Done :)

@Ngone51
Copy link
Member

Ngone51 commented Sep 22, 2020

TaskMemoryManager tries to allocate the memory backing the small array using MemoryManager, but MemoryManager is unwilling to give it any memory, as the TaskMemoryManager is still holding on to the memory it got for the large array.

Doesn't the UnsafeInMemorySorter release the old large point array before requiring the new small point array? Why would TaskMemoryManager still hold the memory of the large array(I assume you mean the large point array)?

@tomvanbussel
Copy link
Contributor Author

@Ngone51 There are three arrays in the example:

  1. The old pointer array.
  2. A new large pointer array that's allocated in growPointerArrayIfNecessary() to replace the old pointer array. The allocation of this array triggered the spill.
  3. A new small pointer array that's allocated while spilling to replace the old pointer array. The allocation of this array causes the OOM exception.

The old pointer array has been released, but TaskMemoryManager is still trying to allocate the new large pointer array when spilling is triggered, and still holds onto the memory it got from the MemoryManager.

@Ngone51
Copy link
Member

Ngone51 commented Sep 23, 2020

The old pointer array has been released, but TaskMemoryManager is still trying to allocate the new large pointer array when spilling is triggered, and still holds onto the memory it got from the MemoryManager.

But from the description of step 3,4,5, the TaskMemoryManager is trying to allocate the new small pointer array rather than the large one:

  1. TaskMemoryManager asks UnsafeExternalSorter to spill, which causes UnsafeExternalSorter to spill the current run to disk, to free its record pages and to reset its UnsafeInMemorySorter.
  2. UnsafeInMemorySorter frees the old pointer array, and tries to allocate a new small pointer array.
  3. TaskMemoryManager tries to allocate the memory backing the small array using MemoryManager, but MemoryManager is unwilling to give it any memory, as the TaskMemoryManager is still holding on to the memory it got for the new large array.

IIUC, it should be the small pointer array...

@tomvanbussel
Copy link
Contributor Author

@Ngone51 Thanks for clarifying! It's both trying to allocate a new large pointer array and a new small pointer array. The allocation of the new large pointer array was what triggered the initial spill, and the new small pointer is allocated while spilling. While it's spilling (and the new small pointer array is allocated), the TaskMemoryManager is still holding to the memory for the partially allocated new large pointer array. Any ideas on how I can clarify this in the description?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan
Copy link
Contributor

cloud-fan commented Sep 25, 2020

Is it possible to refactor it a little bit to make the logic simpler? It's really tricky that we must be careful at each step, in case something was spilled right before the step.

I'm fine if refactoring is too hard and merge the PR as it is.

@tomvanbussel
Copy link
Contributor Author

@cloud-fan I don't really see a way to simply this without making invasive changes to memory management in Spark. For instance we could simplify this if we had some way to reserve memory up front. That seems like a lot of work though, so for now I have tried to keep it understand by adding lots of comments.

@hvanhovell
Copy link
Contributor

hvanhovell commented Sep 29, 2020

Alright merging to master/3.0.

hvanhovell pushed a commit that referenced this pull request Sep 29, 2020
…nalSorter

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`.

Closes #29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel <tom.vanbussel@databricks.com>
Signed-off-by: herman <herman@databricks.com>
(cherry picked from commit f167002)
Signed-off-by: herman <herman@databricks.com>
tomvanbussel added a commit to tomvanbussel/spark that referenced this pull request Sep 30, 2020
…nalSorter

This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter.

Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal.

No

Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`.

Closes apache#29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel <tom.vanbussel@databricks.com>
Signed-off-by: herman <herman@databricks.com>
hvanhovell pushed a commit that referenced this pull request Oct 8, 2020
…ExternalSorter

Backport of #29785 to Spark 2.4

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`.

Closes #29910 from tomvanbussel/backport-SPARK-32901.

Authored-by: Tom van Bussel <tom.vanbussel@databricks.com>
Signed-off-by: herman <herman@databricks.com>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…nalSorter

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`.

Closes apache#29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel <tom.vanbussel@databricks.com>
Signed-off-by: herman <herman@databricks.com>
(cherry picked from commit f167002)
Signed-off-by: herman <herman@databricks.com>
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…ExternalSorter

Backport of apache#29785 to Spark 2.4

This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter.

Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal.

No

Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`.

Closes apache#29910 from tomvanbussel/backport-SPARK-32901.

Authored-by: Tom van Bussel <tom.vanbussel@databricks.com>
Signed-off-by: herman <herman@databricks.com>

Ref: LIHADOOP-58062

RB=2551503
BUG=LIHADOOP-58062
G=spark-reviewers
R=mmuralid
A=mmuralid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants