[SPARK-11801][CORE] Notify driver when OOM is thrown before executor … #9866
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…JVM is killed
This fix try to make sure that task which caught OOM will update its status to driver so that driver logs will have enough information why the tasks are lost or executor is lost. This fix does the following
Registers a shutdown hook for executor which does the following
a) Synchronizes with OOM handler thread (Assumption is that OOM thread is still running and gets the lock prior to the shutdown hook thread. I thought of introducing some delay, but my runs with fix several times didn't get to that situation.)
b) Kill all the remaining tasks running in the current container( I thought it would be good to clean the task properly, so that they wont do any job which might throw unwanted error/exceptions)
c) Sleeps some time so that OOM handler status is flushed to driver(No sleeping causes the status message lost)
Separate handler for OOM, so that we can send proper message to driver.