Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26632][Core] Separate Thread Configurations of Driver and Executor #23560

Closed
wants to merge 24 commits into from

Conversation

jiafuzha
Copy link
Contributor

@jiafuzha jiafuzha commented Jan 16, 2019

What changes were proposed in this pull request?

For the below three thread configuration items applied to both driver and executor,
spark.rpc.io.serverThreads
spark.rpc.io.clientThreads
spark.rpc.netty.dispatcher.numThreads,
we separate them to driver specifics and executor specifics.
spark.driver.rpc.io.serverThreads < - > spark.executor.rpc.io.serverThreads
spark.driver.rpc.io.clientThreads < - > spark.executor.rpc.io.clientThreads
spark.driver.rpc.netty.dispatcher.numThreads < - > spark.executor.rpc.netty.dispatcher.numThreads

Spark reads these specifics first and fall back to the common configurations.

How was this patch tested?

We ran the SimpleMap app without shuffle for benchmark purpose to test Spark's scalability in HPC with omini-path NIC which has higher bandwidth than normal ethernet NIC.

Spark's base version is 2.4.0.
Spark ran in the Standalone mode. Driver was in a standalone node.
After the separation, the performance is improved a lot in 256 nodes and 512 nodes. see below test results of SimpleMapTask before and after the enhancement. You can view the tables in the JIRA too.

ds: spark.driver.rpc.io.serverThreads
dc: spark.driver.rpc.io.clientThreads
dd: spark.driver.rpc.netty.dispatcher.numThreads
ed: spark.executor.rpc.netty.dispatcher.numThreads
time: Overall Time (s)
old time: Overall Time without Separation (s)

Before:

nodes ds dc dd ed time
128 nodes 8 8 8 8 108
256 nodes 8 8 8 8 196
512 nodes 8 8 8 8 377

After:

nodes ds dc dd ed time improvement
128 nodes 15 15 10 30 107 0.9%
256 nodes 12 15 10 30 159 18.8%
512 nodes 12 15 10 30 283 24.9%

Copy link
Contributor

@carsonwang carsonwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very useful in RPC performance tuning, especially on large scale. Can you please fix a few style issue?

@beliefer
Copy link
Contributor

This pr looks very clear.

@carsonwang
Copy link
Contributor

@zjf2012 , can you please update the PR description with the performance test result?

@carsonwang
Copy link
Contributor

cc @cloud-fan @vanzin @jerryshao, can you please help review this? In our test, configuring RPC threads separately for driver and executor are very usefully for performance, especially for large scale.

@cloud-fan
Copy link
Contributor

cc @zsxwing

@jerryshao
Copy link
Contributor

Can you please fix the conflict?

@jerryshao
Copy link
Contributor

ok to test.

@jerryshao
Copy link
Contributor

Maybe we should also update the docs.

@SparkQA
Copy link

SparkQA commented Jan 24, 2019

Test build #101625 has finished for PR 23560 at commit d55d6cb.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 25, 2019

Test build #101660 has finished for PR 23560 at commit ecab1cc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiafuzha
Copy link
Contributor Author

@jerryshao , I checked configuration.md as well as other related md files under the docs folder. I cannot find a proper place to put my documentation. Actually, the spark documentation even doesn't mention IO threads-related configurations. Would you please give some suggestion?

thanks.

@jerryshao
Copy link
Contributor

OK, I see.

@SparkQA
Copy link

SparkQA commented Jan 29, 2019

Test build #101804 has finished for PR 23560 at commit 7519af3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 13, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103464 has started for PR 23560 at commit d73cfb5.

@vanzin
Copy link
Contributor

vanzin commented Mar 13, 2019

@VaibhavDesai ?!?!?

@VaibhavDesai
Copy link

@VaibhavDesai ?!?!?

wont happen again!

@SparkQA
Copy link

SparkQA commented Mar 14, 2019

Test build #103489 has finished for PR 23560 at commit cb4e5aa.

  • This patch fails Java style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2019

Test build #103491 has finished for PR 23560 at commit e4b1023.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiafuzha
Copy link
Contributor Author

@vanzin, Is this ticket ok for you to merge? any more comments? thanks.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid adding comments that just repeat what the code is doing.

@vanzin
Copy link
Contributor

vanzin commented Apr 29, 2019

Also it would be good to update configuration.md so people actually get to know about this.

@jiafuzha
Copy link
Contributor Author

@vanzin , I tried to give some documentation about this PR. Please help review.

@jiafuzha
Copy link
Contributor Author

@carsonwang , please help review.

@SparkQA
Copy link

SparkQA commented Apr 30, 2019

Test build #105023 has finished for PR 23560 at commit cb27a2c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2019

Test build #105138 has finished for PR 23560 at commit 51f78ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private[netty] val transportConf = SparkTransportConf.fromSparkConf(
conf.clone.set(RPC_IO_NUM_CONNECTIONS_PER_PEER, 1),
"rpc",
conf.get(RPC_IO_THREADS).getOrElse(numUsableCores))
conf.get(RPC_IO_THREADS).getOrElse(numUsableCores),
role)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not necessary to separate into another line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine (one arg per line).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryshao , The guide says,

For method declarations, use 4 space indentation for their parameters and put each in each line when the parameters don't fit in two lines.

I'll keep this code.


private[netty] val transportConf = SparkTransportConf.fromSparkConf(
conf.clone.set(RPC_IO_NUM_CONNECTIONS_PER_PEER, 1),
"rpc",
conf.get(RPC_IO_THREADS).getOrElse(numUsableCores))
conf.get(RPC_IO_THREADS).getOrElse(numUsableCores),
role)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine (one arg per line).

docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
@SparkQA
Copy link

SparkQA commented May 7, 2019

Test build #105180 has finished for PR 23560 at commit 6bdfe6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented May 10, 2019

retest this please

@SparkQA
Copy link

SparkQA commented May 10, 2019

Test build #105299 has finished for PR 23560 at commit 6bdfe6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented May 10, 2019

Merging to master.

@vanzin vanzin closed this in fa5dc0a May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet