New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6420] Driver's Block Manager does not use "spark.driver.host" in Yarn-Client mode #5095
Conversation
… Yarn-Client mode
Can one of the admins verify this patch? |
@marsishandsome This seems like a bit of an odd use case. If the spark.driver.host is something different then actual host then errors are going to occur. Why are you wanting to be able to set it rather then have it automatically figure it out? Is it just to make it easier switching between modes? |
@tgravescs In our local network, the yarn-node do not know the hostname of the client. So I have to set spark.driver.host to the client's ip address, so the driver will use it's ip address in stead of hostname. But the driver's blockmanager will still use it's hostname. |
@tgravescs You are right. Maybe we should provide two choices: Ip and Hostname. Both will be automatically figured out by Spark. |
Sorry I don't follow what you mean. Are you just saying DNS is not properly configured or when the client goes to figure out hostname it comes back null? If you run the linux 'hostname' command what does it list? |
@tgravescs Yes, DNS in our network is not properly configured. The yarn-node cannot connect to the client by the hostname of the client machine. My idea is to let the yarn-node use the client's ip address to connect to the client machine. |
@marsishandsome did you get it working on your network? Is this patch a false alarm? |
@andrewor14 Yes, it works in my network. |
You can probably just use the command: "sudo hostname ip" to let spark's blocker manager pickup the ip you wanted from the OS hostname. |
@yongjiaw Thanks for your advice. As a workaround I set the hostname to the ip and it works. I like the solution of Spark-5113. |
@marsishandsome is this change needed then or is that workaround actually the right way to do this? |
@srowen This change is not needed. As Spark-5331 said, Spark provides many services both to internal and external. Spark should provide a way for users to specify hostname or ip. |
In my cluster, the yarn node does not know the client's host name.
So I set "spark.driver.host" to the ip address of the client.
But the driver's Block Manager does not use "spark.driver.host" but the hostname in Yarn-Client mode.
I got the following error:
TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2, hadoop-node1538098): java.io.IOException: Failed to connect to example-hostname
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:193)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:200)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1029)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481)
at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:463)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:849)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:199)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more