Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21907][CORE][BACKPORT 2.2] oom during spill #19481

Conversation

eyalfa
Copy link

@eyalfa eyalfa commented Oct 12, 2017

back-port #19181 to branch-2.2.

What changes were proposed in this pull request?

  1. a test reproducing SPARK-21907
  2. a fix for the root cause of the issue.

org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill calls org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset which may trigger another spill,
when this happens the array member is already de-allocated but still referenced by the code, this causes the nested spill to fail with an NPE in org.apache.spark.memory.TaskMemoryManager.getPage.
This patch introduces a reproduction in a test case and a fix, the fix simply sets the in-mem sorter's array member to an empty array before actually performing the allocation. This prevents the spilling code from 'touching' the de-allocated array.

How was this patch tested?

introduced a new test case: org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill.

1. a test reproducing [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907)
2. a fix for the root cause of the issue.

`org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill` calls `org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset` which may trigger another spill,
when this happens the `array` member is already de-allocated but still referenced by the code, this causes the nested spill to fail with an NPE in `org.apache.spark.memory.TaskMemoryManager.getPage`.
This patch introduces a reproduction in a test case and a fix, the fix simply sets the in-mem sorter's array member to an empty array before actually performing the allocation. This prevents the spilling code from 'touching' the de-allocated array.

introduced a new test case: `org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill`.

Author: Eyal Farago <eyal@nrgene.com>

Closes apache#19181 from eyalfa/SPARK-21907__oom_during_spill.
@eyalfa
Copy link
Author

eyalfa commented Oct 12, 2017

@hvanhovell , @juliuszsompolski

@hvanhovell
Copy link
Contributor

ok to test

@hvanhovell
Copy link
Contributor

add to whitelist

@hvanhovell
Copy link
Contributor

LGTM pending jenkins

@SparkQA
Copy link

SparkQA commented Oct 12, 2017

Test build #82674 has finished for PR 19481 at commit 9b6c5d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Oct 12, 2017
back-port #19181 to branch-2.2.

## What changes were proposed in this pull request?
1. a test reproducing [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907)
2. a fix for the root cause of the issue.

`org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill` calls `org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset` which may trigger another spill,
when this happens the `array` member is already de-allocated but still referenced by the code, this causes the nested spill to fail with an NPE in `org.apache.spark.memory.TaskMemoryManager.getPage`.
This patch introduces a reproduction in a test case and a fix, the fix simply sets the in-mem sorter's array member to an empty array before actually performing the allocation. This prevents the spilling code from 'touching' the de-allocated array.

## How was this patch tested?
introduced a new test case: `org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill`.

Author: Eyal Farago <eyal@nrgene.com>

Closes #19481 from eyalfa/SPARK-21907__oom_during_spill__BACKPORT-2.2.
@hvanhovell
Copy link
Contributor

Merged to 2.2. Thanks!

Can you close the PR?

@hvanhovell
Copy link
Contributor

@eyalfa can you close the PR? The github infra does not this automatically for backports.

@eyalfa eyalfa closed this Oct 13, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
back-port apache#19181 to branch-2.2.

## What changes were proposed in this pull request?
1. a test reproducing [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907)
2. a fix for the root cause of the issue.

`org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill` calls `org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset` which may trigger another spill,
when this happens the `array` member is already de-allocated but still referenced by the code, this causes the nested spill to fail with an NPE in `org.apache.spark.memory.TaskMemoryManager.getPage`.
This patch introduces a reproduction in a test case and a fix, the fix simply sets the in-mem sorter's array member to an empty array before actually performing the allocation. This prevents the spilling code from 'touching' the de-allocated array.

## How was this patch tested?
introduced a new test case: `org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill`.

Author: Eyal Farago <eyal@nrgene.com>

Closes apache#19481 from eyalfa/SPARK-21907__oom_during_spill__BACKPORT-2.2.
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this pull request Oct 13, 2021
back-port apache#19181 to branch-2.2.

1. a test reproducing [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907)
2. a fix for the root cause of the issue.

`org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill` calls `org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset` which may trigger another spill,
when this happens the `array` member is already de-allocated but still referenced by the code, this causes the nested spill to fail with an NPE in `org.apache.spark.memory.TaskMemoryManager.getPage`.
This patch introduces a reproduction in a test case and a fix, the fix simply sets the in-mem sorter's array member to an empty array before actually performing the allocation. This prevents the spilling code from 'touching' the de-allocated array.

introduced a new test case: `org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill`.

Author: Eyal Farago <eyal@nrgene.com>

Closes apache#19481 from eyalfa/SPARK-21907__oom_during_spill__BACKPORT-2.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants