Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25081][Core]Nested spill in ShuffleExternalSorter should not access released memory page (branch-2.2) #22072

Closed
wants to merge 1 commit into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Aug 10, 2018

What changes were proposed in this pull request?

Backport #22062 to branch-2.2. Just two minor differences in the test:

  • branch-2.2 doesn't have SparkOutOfMemoryError. It's using OutOfMemoryError directly.
  • MockitoSugar is in a different package in old scalatest.

How was this patch tested?

Jenkins

…ry page

This issue is pretty similar to [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907).

"allocateArray" in [ShuffleInMemorySorter.reset](https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99) may trigger a spill and cause ShuffleInMemorySorter access the released `array`. Another task may get the same memory page from the pool. This will cause two tasks access the same memory page. When a task reads memory written by another task, many types of failures may happen. Here are some examples I  have seen:

- JVM crash. (This is easy to reproduce in a unit test as we fill newly allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points to an invalid memory address)
- java.lang.IllegalArgumentException: Comparison method violates its general contract!
- java.lang.NullPointerException at org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384)
- java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size -536870912 because the size after growing exceeds size limitation 2147483632

This PR resets states in `ShuffleInMemorySorter.reset` before calling `allocateArray` to fix the issue.

The new unit test will make JVM crash without the fix.

Closes apache#22062 from zsxwing/SPARK-25081.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
@rezasafi
Copy link
Contributor

LGTM

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Aug 10, 2018

Test build #94581 has finished for PR 22072 at commit 1a6452e.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Aug 10, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2018

Test build #94589 has finished for PR 22072 at commit 1a6452e.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 26] do not match the length of object [0]

@felixcheung
Copy link
Member

jenkins retest this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2018

Test build #94597 has started for PR 22072 at commit 1a6452e.

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Aug 11, 2018

Test build #94624 has finished for PR 22072 at commit 1a6452e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Merged to 2.2.

asfgit pushed a commit that referenced this pull request Aug 12, 2018
…access released memory page (branch-2.2)

## What changes were proposed in this pull request?

Backport #22062 to branch-2.2. Just two minor differences in the test:

- branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly.
- MockitoSugar is in a different package in old scalatest.

## How was this patch tested?

Jenkins

Closes #22072 from zsxwing/SPARK-25081-2.2.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
@zsxwing zsxwing closed this Aug 13, 2018
@zsxwing zsxwing deleted the SPARK-25081-2.2 branch August 13, 2018 18:09
Willymontaz pushed a commit to criteo-forks/spark that referenced this pull request Sep 26, 2019
…access released memory page (branch-2.2)

## What changes were proposed in this pull request?

Backport apache#22062 to branch-2.2. Just two minor differences in the test:

- branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly.
- MockitoSugar is in a different package in old scalatest.

## How was this patch tested?

Jenkins

Closes apache#22072 from zsxwing/SPARK-25081-2.2.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
Willymontaz pushed a commit to criteo-forks/spark that referenced this pull request Sep 27, 2019
…access released memory page (branch-2.2)

## What changes were proposed in this pull request?

Backport apache#22062 to branch-2.2. Just two minor differences in the test:

- branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly.
- MockitoSugar is in a different package in old scalatest.

## How was this patch tested?

Jenkins

Closes apache#22072 from zsxwing/SPARK-25081-2.2.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this pull request Oct 11, 2021
…access released memory page (branch-2.2)

## What changes were proposed in this pull request?

Backport apache#22062 to branch-2.2. Just two minor differences in the test:

- branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly.
- MockitoSugar is in a different package in old scalatest.

## How was this patch tested?

Jenkins

Closes apache#22072 from zsxwing/SPARK-25081-2.2.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants