Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26265][Core][BRANCH-2.4] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager #23289

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 11, 2018

What changes were proposed in this pull request?

In BytesToBytesMap.MapIterator.advanceToNextPage, We will first lock this MapIterator and then TaskMemoryManager when going to free a memory page by calling freePage. At the same time, it is possibly that another memory consumer first locks TaskMemoryManager and then this MapIterator when it acquires memory and causes spilling on this MapIterator.

So it ends with the MapIterator object holds lock to the MapIterator object and waits for lock on TaskMemoryManager, and the other consumer holds lock to TaskMemoryManager and waits for lock on the MapIterator object.

To avoid deadlock here, this patch proposes to keep reference to the page to free and free it after releasing the lock of MapIterator.

This backports the fix to branch-2.4.

How was this patch tested?

Added test and manually test by running the test 100 times to make sure there is no deadlock.

@viirya
Copy link
Member Author

viirya commented Dec 11, 2018

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99978 has finished for PR 23289 at commit e408ea6.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99980 has finished for PR 23289 at commit d520a97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thanks. Merged to branch-2.4.

asfgit pushed a commit that referenced this pull request Dec 11, 2018
…erator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager

## What changes were proposed in this pull request?

In `BytesToBytesMap.MapIterator.advanceToNextPage`, We will first lock this `MapIterator` and then `TaskMemoryManager` when going to free a memory page by calling `freePage`. At the same time, it is possibly that another memory consumer first locks `TaskMemoryManager` and then this `MapIterator` when it acquires memory and causes spilling on this `MapIterator`.

So it ends with the `MapIterator` object holds lock to the `MapIterator` object and waits for lock on `TaskMemoryManager`, and the other consumer holds lock to `TaskMemoryManager` and waits for lock on the `MapIterator` object.

To avoid deadlock here, this patch proposes to keep reference to the page to free and free it after releasing the lock of `MapIterator`.

This backports the fix to branch-2.4.

## How was this patch tested?

 Added test and manually test by running the test 100 times to make sure there is no deadlock.

Closes #23289 from viirya/SPARK-26265-2.4.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

Thanks @dongjoon-hyun @cloud-fan

@viirya viirya closed this Dec 12, 2018
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…erator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager

## What changes were proposed in this pull request?

In `BytesToBytesMap.MapIterator.advanceToNextPage`, We will first lock this `MapIterator` and then `TaskMemoryManager` when going to free a memory page by calling `freePage`. At the same time, it is possibly that another memory consumer first locks `TaskMemoryManager` and then this `MapIterator` when it acquires memory and causes spilling on this `MapIterator`.

So it ends with the `MapIterator` object holds lock to the `MapIterator` object and waits for lock on `TaskMemoryManager`, and the other consumer holds lock to `TaskMemoryManager` and waits for lock on the `MapIterator` object.

To avoid deadlock here, this patch proposes to keep reference to the page to free and free it after releasing the lock of `MapIterator`.

This backports the fix to branch-2.4.

## How was this patch tested?

 Added test and manually test by running the test 100 times to make sure there is no deadlock.

Closes apache#23289 from viirya/SPARK-26265-2.4.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…erator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager

## What changes were proposed in this pull request?

In `BytesToBytesMap.MapIterator.advanceToNextPage`, We will first lock this `MapIterator` and then `TaskMemoryManager` when going to free a memory page by calling `freePage`. At the same time, it is possibly that another memory consumer first locks `TaskMemoryManager` and then this `MapIterator` when it acquires memory and causes spilling on this `MapIterator`.

So it ends with the `MapIterator` object holds lock to the `MapIterator` object and waits for lock on `TaskMemoryManager`, and the other consumer holds lock to `TaskMemoryManager` and waits for lock on the `MapIterator` object.

To avoid deadlock here, this patch proposes to keep reference to the page to free and free it after releasing the lock of `MapIterator`.

This backports the fix to branch-2.4.

## How was this patch tested?

 Added test and manually test by running the test 100 times to make sure there is no deadlock.

Closes apache#23289 from viirya/SPARK-26265-2.4.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@viirya viirya deleted the SPARK-26265-2.4 branch December 27, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants