[SPARK-2313] PySpark pass port rather than stdin#3424
Closed
lvsoft wants to merge 3 commits intoapache:masterfrom
Closed
[SPARK-2313] PySpark pass port rather than stdin#3424lvsoft wants to merge 3 commits intoapache:masterfrom
lvsoft wants to merge 3 commits intoapache:masterfrom
Conversation
|
Can one of the admins verify this patch? |
Contributor
|
I think the motivation of SPARK-2313 is to remove the dependency of STDIN to return the port back to Python, just replace it by a socket may works (domain socket may don't work in Window?). There is race condition that the peeked free port will be occupied by other program. So, the approach will be:
|
Author
|
I think this is a better solution. |
asfgit
pushed a commit
that referenced
this pull request
Feb 16, 2015
…hon driver This patch changes PySpark so that the GatewayServer's port is communicated back to the Python process that launches it over a local socket instead of a pipe. The old pipe-based approach was brittle and could fail if `spark-submit` printed unexpected to stdout. To accomplish this, I wrote a custom `PythonGatewayServer.main()` function to use in place of Py4J's `GatewayServer.main()`. Closes #3424. Author: Josh Rosen <joshrosen@databricks.com> Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits: 6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed 0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies. 9bdb4b6 [Josh Rosen] Handle case where getListeningPort returns -1 3fb7ed1 [Josh Rosen] Remove stdout=PIPE 2458934 [Josh Rosen] Use underscore to mark env var. as private d12c95d [Josh Rosen] Use Logging and Utils.tryOrExit() e5f9730 [Josh Rosen] Wrap everything in a giant try-block 2f70689 [Josh Rosen] Use stdin PIPE to share fate with driver 8bf956e [Josh Rosen] Initial cut at passing Py4J gateway port back to driver via socket (cherry picked from commit 0cfda84) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch will fix [SPARK-2313].
It peek available free port number, and pass the port number to Py4j.Gateway for binding via command line argument.
The initial value of the port number is scanned beginning at the mod of PID, which could avoid potential concurrency issues such as supporting multiple PySpark instances in future. And the port number printed from Py4j in STDIN is also parsed for double check.