Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does master and worker nodes expect the hostname static as "master" #44

Open
LuqmanSahaf opened this issue Sep 30, 2014 · 0 comments

Comments

@LuqmanSahaf
Copy link

I am using spark 1.0.0 docker images. When I start master node with hostnames other than "master" it simply fails. Moreover, the worker nodes try to contact the master node using the name master instead of the IP provided in the command line argument of docker run command. It changes the /etc/hadoop/core-site.xml, but why does it contact the master node with the name "master". Following are the logs of master and worker respectively:

1- Master log with hostname other than master:
core@coreos-2 ~ $ docker run -itP -h master spark-master:1.0.0

core@coreos-2 ~ $ docker run -itP spark-master:1.0.0
SPARK_HOME=/opt/spark-1.0.0
HOSTNAME=ad28c0356f17
TERM=xterm
SCALA_VERSION=2.10.3
PATH=/opt/spark-1.0.0:/opt/scala-2.10.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
SPARK_VERSION=1.0.0
PWD=/
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
SHLVL=1
HOME=/root
SCALA_HOME=/opt/scala-2.10.3
_=/usr/bin/env
MASTER_IP=172.17.0.2
preparing Spark
starting Hadoop Namenode
starting sshd
starting Spark Master
starting org.apache.spark.deploy.master.Master, logging to /opt/spark-1.0.0-bin-hadoop1/sbin/../logs/spark-hdfs-org.apache.spark.deploy.master.Master-1-ad28c0356f17.out
Warning: SPARK_MEM is deprecated, please use a more specific config option
(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY).
Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp ::/opt/spark-1.0.0-bin-hadoop1/conf:/opt/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms800m -Xmx800m org.apache.spark.deploy.master.Master --ip master --port 7077 --webui-port 8080
========================================

14/09/30 09:19:19 INFO SecurityManager: Changing view acls to: hdfs
14/09/30 09:19:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs)
14/09/30 09:19:20 INFO Slf4jLogger: Slf4jLogger started
14/09/30 09:19:20 INFO Remoting: Starting remoting
Exception in thread "main" java.net.UnknownHostException: master: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
    at java.net.InetAddress.getAllByName(InetAddress.java:1162)
    at java.net.InetAddress.getAllByName(InetAddress.java:1098)
    at java.net.InetAddress.getByName(InetAddress.java:1048)
    at akka.remote.transport.netty.NettyTransport$$anonfun$addressToSocketAddress$1.apply(NettyTransport.scala:382)
    at akka.remote.transport.netty.NettyTransport$$anonfun$addressToSocketAddress$1.apply(NettyTransport.scala:382)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

2- Worker log:
core@coreos-1 ~/docker-scripts/spark-1.0.0/spark-worker $ docker run -P -h worker spark-worker:1.0.0 10.132.232.22

WORKER_IP=172.17.0.54
preparing Spark
starting Hadoop Datanode
 * Starting Apache Hadoop Data Node server hadoop-datanode
starting datanode, logging to /var/log/hadoop//hadoop--datanode-worker.out
   ...done.
starting sshd
starting Spark Worker
Warning: SPARK_MEM is deprecated, please use a more specific config option
(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY).
14/09/30 09:33:38 INFO SecurityManager: Changing view acls to: hdfs
14/09/30 09:33:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs)
14/09/30 09:33:39 INFO Slf4jLogger: Slf4jLogger started
14/09/30 09:33:40 INFO Remoting: Starting remoting
14/09/30 09:33:40 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@worker:48571]
14/09/30 09:33:40 INFO Worker: Starting Spark worker worker:48571 with 1 cores, 1500.0 MB RAM
14/09/30 09:33:40 INFO Worker: Spark home: /opt/spark-1.0.0
14/09/30 09:33:41 INFO WorkerWebUI: Started WorkerWebUI at http://worker:8081
14/09/30 09:33:41 INFO Worker: Connecting to master spark://master:7077...
14/09/30 09:33:41 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@master:7077]. Address is now gated for 60000 ms, all messages to this address will be delivered to dead letters.
14/09/30 09:33:41 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1054615506] to Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/09/30 09:34:01 INFO Worker: Connecting to master spark://master:7077...
14/09/30 09:34:01 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1054615506] to Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/09/30 09:34:21 INFO Worker: Connecting to master spark://master:7077...
14/09/30 09:34:21 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1054615506] to Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/09/30 09:34:41 ERROR Worker: All masters are unresponsive! Giving up.

P.S: The worker container is present on different machine (coreos-1). Therefore, it cannot connect to master, as there is no global discovery service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant