Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26265][Core][Followup] Put freePage into a finally block #23294

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 12, 2018

What changes were proposed in this pull request?

Based on the comment, it seems to be better to put freePage into a finally block. This patch as a follow-up to do so.

How was this patch tested?

Existing tests.

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

cc @cloud-fan @kiszk

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

@cloud-fan do we also need a similar followup for branch-2.4?

@cloud-fan
Copy link
Contributor

I think this can be merged to 2.4 without conflict. I'll ping you if it doesn't. Thanks!

@kiszk
Copy link
Member

kiszk commented Dec 12, 2018

LGTM, thanks

@SparkQA
Copy link

SparkQA commented Dec 12, 2018

Test build #100002 has finished for PR 23294 at commit 630281b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Dec 12, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Dec 12, 2018

Test build #100011 has finished for PR 23294 at commit 630281b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2018

Test build #100013 has finished for PR 23294 at commit 630281b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Argh, again!

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 24] do not match the length of object [0]
Execution halted

@felixcheung, @viirya

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

@HyukjinKwon Thanks for letting me know. I will look it and ask CRAN admin for help.

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

retest this please.

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

@HyukjinKwon I don't find any problem locally. Let me re-trigger the Jenkins test and confirm it.

@SparkQA
Copy link

SparkQA commented Dec 12, 2018

Test build #100026 has finished for PR 23294 at commit 630281b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

Sorry, I found out where the problem is. Already asked CRAN admin for help.

@HyukjinKwon
Copy link
Member

Thanks, @viirya.

@HyukjinKwon
Copy link
Member

Hey, @viirya, can you send an email to dev mailing list where we're discussing about this, when the tests get back to normal please?

@viirya
Copy link
Member Author

viirya commented Dec 13, 2018

@HyukjinKwon Ok. But I think we better discuss this on the related JIRA ticket SPARK-24152.

@HyukjinKwon
Copy link
Member

Yea that sounds more appropriate place to discuss. As a bonus, looks people looks getting confused why it's being failed.. Let's send new email as new thread that the tests became normal as well then.

@HyukjinKwon
Copy link
Member

retest this please

@viirya
Copy link
Member Author

viirya commented Dec 13, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Dec 13, 2018

Test build #100061 has finished for PR 23294 at commit 630281b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

all pass

}
try {
Closeables.close(reader, /* swallowIOException = */ false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we close the reader outside of the synchronized block? Then we don't need to extra try catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the second try, it accesses spillWriters:

reader = spillWriters.getFirst().getReader(serializerManager)

The synchronized spill will change spillWriters. Seems to be unsafe if we close the reader and update reader like above outside synchronized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

UnsafeSorterSpillReader readerToClose = null;
synchronized (this) {
  ...
  readerToClose  = reader;
  reader = spillWriters.getFirst().getReader(serializerManager);
  ...
}
try {
  Closeables.close(readerToClose, /* swallowIOException = */ false);
} catch ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the semantics here, but because the close block is changing several fields at once, it seems more conservative to leave them as they are within the synchronized block. At least, that is a separate issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to go ahead as is

@SparkQA
Copy link

SparkQA commented Dec 13, 2018

Test build #100074 has finished for PR 23294 at commit 630281b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 14, 2018

Test build #4468 has finished for PR 23294 at commit 630281b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2018

Test build #4471 has finished for PR 23294 at commit 630281b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

tests were already passed for the same commit at #23294 (comment)

Merged to master and branch-2.4.

asfgit pushed a commit that referenced this pull request Dec 15, 2018
## What changes were proposed in this pull request?

Based on the [comment](#23272 (comment)), it seems to be better to put `freePage` into a `finally` block. This patch as a follow-up to do so.

## How was this patch tested?

Existing tests.

Closes #23294 from viirya/SPARK-26265-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 1b604c1)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@asfgit asfgit closed this in 1b604c1 Dec 15, 2018
@SparkQA
Copy link

SparkQA commented Dec 15, 2018

Test build #100174 has finished for PR 23294 at commit 630281b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
## What changes were proposed in this pull request?

Based on the [comment](apache#23272 (comment)), it seems to be better to put `freePage` into a `finally` block. This patch as a follow-up to do so.

## How was this patch tested?

Existing tests.

Closes apache#23294 from viirya/SPARK-26265-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Based on the [comment](apache#23272 (comment)), it seems to be better to put `freePage` into a `finally` block. This patch as a follow-up to do so.

## How was this patch tested?

Existing tests.

Closes apache#23294 from viirya/SPARK-26265-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
## What changes were proposed in this pull request?

Based on the [comment](apache#23272 (comment)), it seems to be better to put `freePage` into a `finally` block. This patch as a follow-up to do so.

## How was this patch tested?

Existing tests.

Closes apache#23294 from viirya/SPARK-26265-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 1b604c1)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
## What changes were proposed in this pull request?

Based on the [comment](apache#23272 (comment)), it seems to be better to put `freePage` into a `finally` block. This patch as a follow-up to do so.

## How was this patch tested?

Existing tests.

Closes apache#23294 from viirya/SPARK-26265-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 1b604c1)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…locking both BytesToBytesMap.MapIterator and TaskMemoryManager

In `BytesToBytesMap.MapIterator.advanceToNextPage`, We will first lock this `MapIterator` and then `TaskMemoryManager` when going to free a memory page by calling `freePage`. At the same time, it is possibly that another memory consumer first locks `TaskMemoryManager` and then this `MapIterator` when it acquires memory and causes spilling on this `MapIterator`.

So it ends with the `MapIterator` object holds lock to the `MapIterator` object and waits for lock on `TaskMemoryManager`, and the other consumer holds lock to `TaskMemoryManager` and waits for lock on the `MapIterator` object.

To avoid deadlock here, this patch proposes to keep reference to the page to free and free it after releasing the lock of `MapIterator`.

Added test and manually test by running the test 100 times to make sure there is no deadlock.

Closes apache#23272 from viirya/SPARK-26265.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry-picked from commit a3bbca9)

[SPARK-26265][CORE][FOLLOWUP] Put freePage into a finally block

Based on the [comment](apache#23272 (comment)), it seems to be better to put `freePage` into a `finally` block. This patch as a follow-up to do so.

Existing tests.

Closes apache#23294 from viirya/SPARK-26265-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry-picked from commit 1b604c1)

Ref: LIHADOOP-43221

RB=1518143
BUG=LIHADOOP-43221
G=superfriends-reviewers
R=fli,mshen,yezhou,edlu
A=fli
@viirya viirya deleted the SPARK-26265-followup branch December 27, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants