Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4799] Use IP address instead of local hostname in ConnectionManager #3645

Closed
wants to merge 1 commit into from

Conversation

smola
Copy link
Contributor

@smola smola commented Dec 9, 2014

See https://issues.apache.org/jira/browse/SPARK-4799

Spark fails when a node hostname is not resolvable by other nodes.

See an example trace:

14/12/09 17:02:41 ERROR SendingConnection: Error connecting to 27e434cf36ac:35093
java.nio.channels.UnresolvedAddressException
    at sun.nio.ch.Net.checkAddress(Net.java:127)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
    at org.apache.spark.network.SendingConnection.connect(Connection.scala:299)
    at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:278)
    at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)

The relevant code is here:

val id = new ConnectionManagerId(Utils.localHostName, serverChannel.socket.getLocalPort)

val id = new ConnectionManagerId(Utils.localHostName, serverChannel.socket.getLocalPort)

This piece of code should use the host IP with Utils.localIpAddress or a method that acknowleges user settings (e.g. SPARK_LOCAL_IP). Since I cannot think about a use case for using hostname here, I'm creating a PR with the former solution, but if you think the later is better, I'm willing to create a new PR with a more elaborate fix.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@JoshRosen
Copy link
Contributor

/cc @rxin @aarondav, since this is NIO connection manager related.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24766 has finished for PR 3645 at commit 45d0356.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pwendell
Copy link
Contributor

pwendell commented Jan 6, 2015

@smola hey - currently Utils.localHostName should respect SPARK_LOCAL_IP if it is set. It will do a reverse lookup and find the associated hostname. Could you describe the network set-up on your machines?

@smola
Copy link
Contributor Author

smola commented Jan 12, 2015

@pwendell Right. The problem is that there is no way to force the use of a given IP (ignoring reverse lookups or any other hostname/ip detection mechanisms).

I get this on Docker, the default set up is something like this:

Spark worker:

  • IP: 172.17.0.11
  • Hostname: hashone

Spark driver:

  • IP: 172.17.0.12
  • Hostname: hashtwo

Spark worker cannot resolve hashtwo and Spark driver cannot resolve hashone. At some point, Spark worker throws an exception because it's trying to resolve hashtwo instead of just contacting 172.17.0.12.

@pwendell
Copy link
Contributor

Yeah we've also seen this issue in docker environments. There is an alternative solution we just merged that allows overriding the reverse DNS lookup - and in our deployment we just set it directly to the IP.

#3893

Is that sufficient for your use case? The benefit with #3893 is that it doesn't change default behavior in the way this patch does.

@pwendell
Copy link
Contributor

BTW - I also created this JIRA to try and clean up the way we deal with binding and advertised hostnames in Spark:

https://issues.apache.org/jira/browse/SPARK-5078

@smola
Copy link
Contributor Author

smola commented Jan 12, 2015

@pwendell Thanks! #3893 is good for me. I'm closing this PR.

@smola smola closed this Jan 12, 2015
@nikonyrh
Copy link

Hi, I would like to understand why slaves use SPARK_LOCAL_IP instead of SPARK_PUBLIC_DNS when talking to other slaves? It is also shown on "Address" column of Workers table on Spark Master UI.

Earlier I thought that I could have SPARK_LOCAL_IP be the docker container's ip and SPARK_PUBLIC_DNS match host's ip, and with port forwarding other nodes and the driver are able to talk to slaves run inside containers. In this context the internal address isn't accessible except from the container's host OS.

Related to #3893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants