[#630] Disable the localShuffleReader by default in Spark3 client

apache · Feb 21, 2023 · e81cfd0 · e81cfd0
1 parent e20fb62
commit e81cfd0
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 0 deletions.
diff --git a/client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java b/client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
@@ -201,6 +201,8 @@ public RssShuffleManager(SparkConf conf, boolean isDriver) {
     // External shuffle service is not supported when using remote shuffle service
     sparkConf.set("spark.shuffle.service.enabled", "false");
     LOG.info("Disable external shuffle service in RssShuffleManager.");
+    sparkConf.set("spark.sql.adaptive.localShuffleReader.enabled", "false");
+    LOG.info("Disable local shuffle reader in RssShuffleManager.");
     taskToSuccessBlockIds = Maps.newConcurrentMap();
     taskToFailedBlockIds = Maps.newConcurrentMap();
     // for non-driver executor, start a thread for sending shuffle data to shuffle server

diff --git a/docs/client_guide.md b/docs/client_guide.md
@@ -80,6 +80,10 @@ and Continuous partition assignment mechanism.
         # Default value is 1.0, used to estimate task concurrency, how likely is this part of the resource between spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors to be allocated
         --conf spark.rss.estimate.task.concurrency.dynamic.factor=1.0
         ```
+
+3. In `RssShuffleManager`, the local shuffle reader has been disabled by default (`spark.sql.adaptive.localShuffleReader.enabled=false`), as it would cause too many random small IOs and too many network connections with shuffle servers. 
+
+
 ### Deploy MapReduce Client Plugin
 
 1. Add client jar to the classpath of each NodeManager, e.g., <HADOOP>/share/hadoop/mapreduce/