-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM #2941
Conversation
Test build #22202 has started for PR 2941 at commit
|
Ah, so there was a race between Python telling Java that it was exiting and Java realizing that Python was going to exit? It looks like this should fix that by having the Python worker wait until the JVM has acknowledged that it knows that Python is going to exit. |
The race is that which of reader or writer thread will know that the worker has exited, If reader find it first, then no problem, but if writer find it first, then it will throw IOException. |
It's not easy to reproduce this failure, but it did fail in jenkins:
|
Also I can not reproduce this without daemon.py (simulate the behavior in Windows). |
This fix looks good to me, so I'm going to merge it in to unblock the other PRs (I ran the tests locally and it doesn't look like this introduced any regressions). We can continue to revisit if we encounter issues. |
Test build #22202 has finished for PR 2941 at commit
|
Test FAILed. |
In case of take() or exception in Python, python worker may exit before JVM read() all the response, then the write thread may raise "Connection reset" exception.
Python should always wait JVM to close the socket first.
cc @JoshRosen This is a warm fix, or the tests will be flaky, sorry for that.