Skip to content

[SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever#45385

Closed
lastbus wants to merge 4 commits intoapache:branch-3.5from
lastbus:branch-3.5
Closed

[SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever#45385
lastbus wants to merge 4 commits intoapache:branch-3.5from
lastbus:branch-3.5

Conversation

@lastbus
Copy link

@lastbus lastbus commented Mar 5, 2024

What changes were proposed in this pull request?

When a task has finished and sent messages back to the driver, but the driver cannot create new thread because of insufficient memory, then the driver will hangs indefinitely. In this situation, we choose to exit JVM.

In my opinion, the root cause is that the OutOfMemoryError is not being caught properly by the driver. The driver do not
have Thread.setDefaultUncaughtExceptionHandler, so the oom thrown by the MessageLoop is not being handled.

Why are the changes needed?

Fatal exception may cause the driver hangs indefinitely.

Does this PR introduce any user-facing change?

No

How was this patch tested?

N/A

Was this patch authored or co-authored using generative AI tooling?

No

…such as oom, exit the JVM to avoid the driver hanging forever
@github-actions github-actions bot added the CORE label Mar 5, 2024
@yaooqinn
Copy link
Member

Instead of handling such a special case here, JVM has provided helpful arguments to deal with OutOfMemoryError.

…such as oom, exit the JVM to avoid the driver hanging indefinitely
@monkeyboy123
Copy link
Contributor

Instead of handling such a special case here, JVM has provided helpful arguments to deal with OutOfMemoryError.

+1

ke.ma and others added 2 commits March 12, 2024 15:07
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 26, 2024
@github-actions github-actions bot closed this Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants