New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SNDBUF` in SparkConf. #18964

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
6 participants
@neoremind

neoremind commented Aug 16, 2017

What changes were proposed in this pull request?

TCP parameters like SO_RCVBUF and SO_SNDBUF can be set in SparkConf, and org.apache.spark.network.server.TransportServer can use those parameters to build server by leveraging netty. But for TransportClientFactory, there is no such way to set those parameters from SparkConf. This could be inconsistent in server and client side when people set parameters in SparkConf. So this PR make RPC client to be enable to use those TCP parameters as well.

How was this patch tested?

Existing tests.

@neoremind

This comment has been minimized.

neoremind commented Aug 16, 2017

@jerryshao please review my separated PR. Thanks!

@jerryshao

This comment has been minimized.

Contributor

jerryshao commented Aug 17, 2017

The change looks OK to me. Did you meet the issue in which you have to change the buffer size in the client side?

@neoremind

This comment has been minimized.

neoremind commented Aug 17, 2017

Not yet since it is OK to keep buffer size as default system value, but to keep it consistent as user would like to specify, this makes sense.
I also notice that Spark RPC by default uses java native serialization, even a verifying endpoint exist or not request would cost 1K of payload size, not to mention some other real logic endpoint, so in the real world it might be useful to profile this, I suggest maybe providing RPC related monitoring log or hook would be beneficial, anyway this should be discussed in another thread.

@jiangxb1987

This comment has been minimized.

Contributor

jiangxb1987 commented Aug 17, 2017

This change looks reasonable, cc @zsxwing @cloud-fan for another look.

@neoremind

This comment has been minimized.

neoremind commented Aug 23, 2017

@cloud-fan would you take a look of the PR, the update is very simple. Thanks very much!

@zsxwing

This comment has been minimized.

Member

zsxwing commented Aug 23, 2017

I also notice that Spark RPC by default uses java native serialization, even a verifying endpoint exist or not request would cost 1K of payload size, not to mention some other real logic endpoint, so in the real world it might be useful to profile this

@neoremind did you see any performance issue caused by Spark RPC? Spark doesn't send a lot of RPC messages. I don't see it's a bottleneck even when we tried to optimize the latency in Structured Streaming.

@neoremind

This comment has been minimized.

neoremind commented Aug 24, 2017

@zsxwing I did try to create a performance test against spark rpc, the test result can be found here, note that I created the project for studying purpose and the code is based on 2.1.0. As you said, the performance would not be dropped as client not using SO_RCVBUF and SO_SNDBUF set in SparkConf.

For example, I use the scenario of concurrent calls 10, total calls 100k, keep all things as default, the QPS would be around 11k. When I set SO_RCVBUF and SO_SNDBUF to extremely small number like 100 the performance is affected tremendously. If they are set to a large number like 128k, the results won't be boosted by whether clients set the corresponding SO_RCVBUF and SO_SNDBUF value or not. Also many internal spark endpoints use ThreadSafeRpcEndpoint which not allowing concurrent calls. So I agree spark is not making many rpc calls and there are no performance issues.

I admit that the update is trivial but from users' perspective, if spark.{module}.io.sendBuffer and spark.{module}.io.sendBuffer are exposed outside and could be set, and they only works on server side, I think it is a little bit not consistent, so I raise the PR to try to make it work on both server and client side, just to make them consistent.

@@ -210,6 +210,14 @@ private TransportClient createClient(InetSocketAddress address)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
.option(ChannelOption.ALLOCATOR, pooledAllocator);
if (conf.receiveBuf() > 0) {
bootstrap.option(ChannelOption.SO_RCVBUF, conf.receiveBuf());

This comment has been minimized.

@cloud-fan

cloud-fan Aug 24, 2017

Contributor

what's the difference between option and childOption?

This comment has been minimized.

@neoremind

neoremind Aug 24, 2017

For server side, netty leverages reactor pattern and ServerBootstrap is used here, method option works when socket binding and connecting occurs, while childOption works for the later process when connection is established and there will be one channel per client.
For client side, spark rpc uses Bootstrap, there is no method to set childOption because client works not like server side, it is pretty simple for client, only one thread-pool will be used by netty to do network I/O.

@zsxwing

This comment has been minimized.

Member

zsxwing commented Aug 24, 2017

ok to test

@zsxwing

This comment has been minimized.

Member

zsxwing commented Aug 24, 2017

LGTM

@zsxwing

This comment has been minimized.

Member

zsxwing commented Aug 24, 2017

@neoremind that's an interesting project. However, Spark RPC is not designed for high-performance and general RPC. In general, Spark just needs a good enough RPC system. That's why it's using Java serialization.

@neoremind

This comment has been minimized.

neoremind commented Aug 24, 2017

@zsxwing Thanks for reviewing. The project I mentioned above is for studying purpose and hope it will help others who are interested. I totally agree that spark rpc mainly for internal use, but as I tested, its performance is good though in general cases, which is good news :)

@SparkQA

This comment has been minimized.

SparkQA commented Aug 24, 2017

Test build #81097 has finished for PR 18964 at commit 14ba13b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@zsxwing

This comment has been minimized.

Member

zsxwing commented Aug 24, 2017

Thanks. Merging to master.

@asfgit asfgit closed this in 763b83e Aug 24, 2017

ptkool added a commit to ptkool/spark that referenced this pull request Nov 13, 2017

[SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SN…
…DBUF` in SparkConf.

## What changes were proposed in this pull request?

TCP parameters like SO_RCVBUF and SO_SNDBUF can be set in SparkConf, and `org.apache.spark.network.server.TransportServe`r can use those parameters to build server by leveraging netty. But for TransportClientFactory, there is no such way to set those parameters from SparkConf. This could be inconsistent in server and client side when people set parameters in SparkConf. So this PR make RPC client to be enable to use those TCP parameters as well.

## How was this patch tested?

Existing tests.

Author: xu.zhang <xu.zhang@hulu.com>

Closes apache#18964 from neoremind/add_client_param.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment