[SPARK-31149][PySpark] PySpark job not killing Spark Daemon processes…#27903
[SPARK-31149][PySpark] PySpark job not killing Spark Daemon processes…#27903avenherak wants to merge 1 commit intoapache:branch-2.4from
Conversation
… after the executor is killed due to OOM
|
Can one of the admins verify this patch? |
| # Send SIGHUP to notify workers of shutdown | ||
| os.kill(0, SIGHUP) | ||
| os.kill(0, SIGTERM) | ||
| sys.exit(code) |
There was a problem hiding this comment.
SIGHUP is supposed to gracefully kill workers, and the we kill gracefully the daemon via proper sys.exit. Can you elabourate the process termination behaviours before and after this fix?
There was a problem hiding this comment.
Before the fix:
ps -ef|grep -i pyspark.daemon
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
After the fix:
ps -ef|grep -i pyspark.daemon
|
@avenherak, please keep the PR template and elabourate each item. Considering this is the core fix in PySpark, it should better thoroughly investigate and describe what this change proposes. |
|
What's "Solution to SPARK-31149" and reproducible steps? |
What changes were proposed in this pull request?
Changes made to daemon.py under /python/pyspark:
def shutdown(code):
signal.signal(SIGTERM, SIG_DFL)
# Send SIGHUP to notify workers of shutdown
os.kill(0, SIGTERM) #Line104
exit(code)
Solution to SPARK-31149. SIGHUP does not kill all pyspark daemon in all cases, so more strict SIGKILL is required
Why are the changes needed?
Pyspark was running Spark Daemon processes even after killing the Spark job on Yarn. After making below changes to daemon.py the job killed the Spark Daemon processes.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Before the fix:
ps -ef|grep -i pyspark.daemon
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
After the fix:
ps -ef|grep -i pyspark.daemon