Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19486][CORE](try 3) Investigate using multiple threads for task serialization #17139

Closed
wants to merge 2 commits into from

Conversation

witgo
Copy link
Contributor

@witgo witgo commented Mar 2, 2017

What changes were proposed in this pull request?

See https://issues.apache.org/jira/browse/SPARK-19486

In the case of stage has a lot of tasks, this PR can improve the scheduling performance of 15%

The test code:

val rdd = sc.parallelize(0 until 100).repartition(100000)
rdd.localCheckpoint().count()
rdd.sum()
(1 to 10).foreach{ i=>
  val serializeStart = System.nanoTime()
  rdd.sum()
  val serializeFinish = System.nanoTime()
  println(f"${(serializeFinish - serializeStart) / 1E9}%1.4f")
}

and spark-defaults.conf file:

spark.master                                      yarn-client
spark.executor.instances                          20
spark.driver.memory                               64g
spark.executor.memory                             30g
spark.executor.cores                              5
spark.default.parallelism                         100 
spark.sql.shuffle.partitions                      100
spark.serializer                                  org.apache.spark.serializer.KryoSerializer
spark.driver.maxResultSize                        0
spark.ui.enabled                                  false 
spark.driver.extraJavaOptions                     -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=512M 
spark.executor.extraJavaOptions                   -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M 
spark.cleaner.referenceTracking.blocking          true
spark.cleaner.referenceTracking.blocking.shuffle  true

The test results are as follows

partition SPARK-18890 db0ddce
100 0.0273 s 0.028 s
1K 0.1234 s 0.1321 s
10k 0.6557 s 0.9502 s
100K 6.1541 s 9.4179 s

How was this patch tested?

Existing tests.

@witgo witgo force-pushed the SPARK-18890-multi-threading branch 3 times, most recently from a06f8c8 to 6874d1e Compare March 2, 2017 15:58
@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73775 has finished for PR 17139 at commit a06f8c8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializeTask(task: TaskDescription) extends CoarseGrainedClusterMessage
  • class SerializeTaskEndpoint(override val rpcEnv: RpcEnv) extends RpcEndpoint with Logging

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73776 has finished for PR 17139 at commit 6874d1e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializeTask(task: TaskDescription) extends CoarseGrainedClusterMessage
  • class SerializeTaskEndpoint(override val rpcEnv: RpcEnv) extends RpcEndpoint with Logging

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73769 has finished for PR 17139 at commit bfa285b.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializeTask(task: TaskDescription) extends CoarseGrainedClusterMessage
  • class SerializeTaskEndpoint(override val rpcEnv: RpcEnv) extends RpcEndpoint with Logging

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73771 has finished for PR 17139 at commit af5fc9f.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializeTask(task: TaskDescription) extends CoarseGrainedClusterMessage
  • class SerializeTaskEndpoint(override val rpcEnv: RpcEnv) extends RpcEndpoint with Logging

@witgo witgo force-pushed the SPARK-18890-multi-threading branch from 6874d1e to 4d9b666 Compare March 3, 2017 14:25
@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73848 has finished for PR 17139 at commit 4d9b666.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializeTask(task: TaskDescription) extends CoarseGrainedClusterMessage
  • class SerializeTaskEndpoint(override val rpcEnv: RpcEnv) extends RpcEndpoint with Logging

@witgo witgo changed the title [WIP][SPARK-18890][CORE](try 3) Move task serialization from the TaskSetManager to the CoarseGrainedSchedulerBackend [SPARK-18890][CORE](try 3) Move task serialization from the TaskSetManager to the CoarseGrainedSchedulerBackend Mar 4, 2017
@witgo
Copy link
Contributor Author

witgo commented Mar 8, 2017

ping @kayousterhout @squito

@kayousterhout
Copy link
Contributor

Why is the time improvement so much larger here than in the other PR?

@witgo
Copy link
Contributor Author

witgo commented Mar 8, 2017

Added the multi-threaded code for serialization TaskDescription .

@kayousterhout
Copy link
Contributor

Can you also post the time differences for some smaller jobs (e.g., 100 tasks, 1000 tasks, 10K tasks) to get a sense of how this varies with size?

@witgo
Copy link
Contributor Author

witgo commented Mar 9, 2017

@kayousterhout The test report has been updated.

@witgo witgo force-pushed the SPARK-18890-multi-threading branch from 4d9b666 to 8fbe15c Compare March 9, 2017 15:05
@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74270 has finished for PR 17139 at commit 8fbe15c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@witgo witgo changed the title [SPARK-18890][CORE](try 3) Move task serialization from the TaskSetManager to the CoarseGrainedSchedulerBackend [SPARK-19486][CORE](try 3) Investigate using multiple threads for task serialization Mar 18, 2017
@jiangxb1987
Copy link
Contributor

@witgo Are you still working on this?

@witgo
Copy link
Contributor Author

witgo commented Jun 22, 2017

@jiangxb1987 ,Yes do you have any questions?

@jiangxb1987
Copy link
Contributor

Please bring this PR up to date, then maybe some guy can review this. :)

@srowen
Copy link
Member

srowen commented Jun 22, 2017

This should probably just be closed

@srowen srowen mentioned this pull request Jun 25, 2017
@gatorsmile
Copy link
Member

We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!

@asfgit asfgit closed this in b32bd00 Jun 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants