Skip to content

Commit

Permalink
[SPARK-6667] [PySpark] remove setReuseAddress
Browse files Browse the repository at this point in the history
The reused address on server side had caused the server can not acknowledge the connected connections, remove it.

This PR will retry once after timeout, it also add a timeout at client side.

Author: Davies Liu <davies@databricks.com>

Closes #5324 from davies/collect_hang and squashes the following commits:

e5a51a2 [Davies Liu] remove setReuseAddress
7977c2f [Davies Liu] do retry on client side
b838f35 [Davies Liu] retry after timeout

(cherry picked from commit 0cce545)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
  • Loading branch information
Davies Liu authored and JoshRosen committed Apr 2, 2015
1 parent 758ebf7 commit a73055f
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 1 deletion.
Expand Up @@ -623,7 +623,6 @@ private[spark] object PythonRDD extends Logging {
*/
private def serveIterator[T](items: Iterator[T], threadName: String): Int = {
val serverSocket = new ServerSocket(0, 1)
serverSocket.setReuseAddress(true)
// Close the socket if no connection in 3 seconds
serverSocket.setSoTimeout(3000)

Expand Down
1 change: 1 addition & 0 deletions python/pyspark/rdd.py
Expand Up @@ -114,6 +114,7 @@ def _parse_memory(s):

def _load_from_socket(port, serializer):
sock = socket.socket()
sock.settimeout(3)
try:
sock.connect(("localhost", port))
rf = sock.makefile("rb", 65536)
Expand Down

0 comments on commit a73055f

Please sign in to comment.