Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26117][CORE][SQL]use SparkOutOfMemoryError instead of OutOfMemoryError when catch exception #23084

Closed
wants to merge 1 commit into from

Conversation

heary-cao
Copy link
Contributor

What changes were proposed in this pull request?

the pr #20014 which introduced SparkOutOfMemoryError to avoid killing the entire executor when an OutOfMemoryError is thrown.
so apply for memory using MemoryConsumer. allocatePage when catch exception, use SparkOutOfMemoryError instead of OutOfMemoryError

How was this patch tested?

N / A

@heary-cao
Copy link
Contributor Author

cc @cloud-fan @kiszk @dongjoon-hyun

@@ -741,7 +742,7 @@ public boolean append(Object kbase, long koff, int klen, Object vbase, long voff
if (numKeys >= growthThreshold && longArray.size() < MAX_CAPACITY) {
try {
growAndRehash();
} catch (OutOfMemoryError oom) {
} catch (SparkOutOfMemoryError oom) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know what was the behavior before? Will we propagate the exception all the way up and kill the executor?

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

add to whitelist

@cloud-fan
Copy link
Contributor

good catch! thanks!

@kiszk
Copy link
Member

kiszk commented Nov 19, 2018

I think that we need to take care of UnsafeExternalSorterSuite.testGetIterator and UnsafeInMemorySorterSuite.freeAfterOOM, too.

@SparkQA
Copy link

SparkQA commented Nov 19, 2018

Test build #99006 has finished for PR 23084 at commit 7345942.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2018

Test build #99002 has finished for PR 23084 at commit 7345942.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

@cloud-fan, I think not kill executor. the memory allocate fails, and the caller is notified to perform other operations. for example, HashAggregateExec will carry out spill the map and fallback to sort-based.
please correct if i understand something wrong. thanks.

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99045 has finished for PR 23084 at commit 400bcd5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99046 has finished for PR 23084 at commit 400bcd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

for example, HashAggregateExec will carry out spill the map and fallback to sort-based

Do you mean this patch change nothing for this case?

@heary-cao
Copy link
Contributor Author

I think growAndRehash always thrown SparkOutOfMemoryError. catch SparkOutOfMemoryError or OutOfMemoryError will make canGrowArray = false. please correct if i understand something wrong. thanks.

@heary-cao
Copy link
Contributor Author

so this PR is just code cleanup, not a real bug. thanks.

@cloud-fan
Copy link
Contributor

I think it's safer to only catch the spark-thrown OOM, not the system OOM, so LGTM

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 466d011 Nov 23, 2018
@heary-cao
Copy link
Contributor Author

@cloud-fan,thanks

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…moryError when catch exception

## What changes were proposed in this pull request?

the pr apache#20014 which introduced `SparkOutOfMemoryError` to avoid killing the entire executor when an `OutOfMemoryError `is thrown.
so apply for memory using `MemoryConsumer. allocatePage `when  catch exception, use `SparkOutOfMemoryError `instead of `OutOfMemoryError`

## How was this patch tested?
N / A

Closes apache#23084 from heary-cao/SparkOutOfMemoryError.

Authored-by: caoxuewen <cao.xuewen@zte.com.cn>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants