Skip to content

Commit

Permalink
[#630] Disable the localShuffleReader by default in Spark3 client
Browse files Browse the repository at this point in the history
  • Loading branch information
zuston committed Feb 21, 2023
1 parent e20fb62 commit e81cfd0
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,8 @@ public RssShuffleManager(SparkConf conf, boolean isDriver) {
// External shuffle service is not supported when using remote shuffle service
sparkConf.set("spark.shuffle.service.enabled", "false");
LOG.info("Disable external shuffle service in RssShuffleManager.");
sparkConf.set("spark.sql.adaptive.localShuffleReader.enabled", "false");
LOG.info("Disable local shuffle reader in RssShuffleManager.");
taskToSuccessBlockIds = Maps.newConcurrentMap();
taskToFailedBlockIds = Maps.newConcurrentMap();
// for non-driver executor, start a thread for sending shuffle data to shuffle server
Expand Down
4 changes: 4 additions & 0 deletions docs/client_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ and Continuous partition assignment mechanism.
# Default value is 1.0, used to estimate task concurrency, how likely is this part of the resource between spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors to be allocated
--conf spark.rss.estimate.task.concurrency.dynamic.factor=1.0
```

3. In `RssShuffleManager`, the local shuffle reader has been disabled by default (`spark.sql.adaptive.localShuffleReader.enabled=false`), as it would cause too many random small IOs and too many network connections with shuffle servers.


### Deploy MapReduce Client Plugin

1. Add client jar to the classpath of each NodeManager, e.g., <HADOOP>/share/hadoop/mapreduce/
Expand Down

0 comments on commit e81cfd0

Please sign in to comment.